likes
comments
collection
share

Mysql锁超时&死锁定位及解决

作者站长头像
站长
· 阅读数 15

前言

除了slow sql的处理,常见的还有锁超时和死锁的定位。

锁超时一般发生在两个update对同一行或者存在间隙锁的交集情况下,在业务中一个事务持有锁之后因为发生了IO阻塞或者等待或者处理其他逻辑耗时导致事务一直没有被提交,锁得不到释放。

死锁时一般发生在两个事务分别先后持有两个不同的锁。比如事务1先持有A锁,事务2先持有B锁,事务1后准备持有B锁,事务2后持有A锁,那么这两个事务后面打算持有的锁都被对方先持有了,那么就出现了死锁。

下面我们将对锁超时和死锁解释下怎么定位和解决,mysql版本为:

mysql> select version();

+-----------+

| version() |

+-----------+

| 8.0.27    |

+-----------+

1 row in set (0.00 sec)

测试表结构为:

| Table | Create Table                                                                                                                                                                                                                                                                                                                                                                                 |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| test  | CREATE TABLE `test` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
  `name` char(32) NOT NULL COMMENT '用户姓名',
  `num` int DEFAULT NULL,
  `phone` char(11) DEFAULT '' COMMENT '手机号',
  PRIMARY KEY (`id`),
  KEY `idx_name_phone` (`name`,`phone`)
) ENGINE=InnoDB AUTO_INCREMENT=106 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='test表'           |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.04 sec)
| Table | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| user  | CREATE TABLE `user` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
  `create_time` datetime(3) DEFAULT CURRENT_TIMESTAMP(3) COMMENT '创建时间',
  `update_time` datetime(3) DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3) COMMENT '更新时间',
  `is_delete` tinyint NOT NULL DEFAULT '0' COMMENT '是否删除 0:否 1:是',
  `phone` char(11) DEFAULT '' COMMENT '手机号',
  `user_name` char(64) NOT NULL COMMENT '用户名',
  `head` varchar(255) DEFAULT NULL COMMENT '用户头像',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='用户表'                              |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

提前需要了解的

关于Mysql InnoDB的各类锁请参考《Mysql InnoDB之各类锁》,里面有介绍怎么查看持有锁和请求锁的情况可以使用select * from performance_schema.data_locks进行查看这里就不做过多描述。查看有哪些线程正在执行:show processlist。查看当前运行的所有事务:select * from information_schema.INNODB_TRX

performance_schema:你可以查询performance_schema数据库中的表,以查看有关服务器及其运行的应用程序的性能特征的实时信息。有关详细信息,请参阅 第 27 章,MySQL 性能模式

information_schema:您可以使用InnoDB information_schema表提取有关InnoDB管理的架构对象的元数据。

锁超时

事务1:

mysql> begin;update test set phone='123' where id=1;
Query OK, 0 rows affected (0.00 sec)

事务2:

mysql> begin;update test set phone='123' where id=1;
Query OK, 0 rows affected (0.00 sec)

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

show processlist:

mysql> show processlist;
+----+-----------------+-----------+---------+---------+-------+------------------------+----------------------------------------+
| Id | User            | Host      | db      | Command | Time  | State                  | Info                                   |
+----+-----------------+-----------+---------+---------+-------+------------------------+----------------------------------------+
|  5 | event_scheduler | localhost | NULL    | Daemon  | 13118 | Waiting on empty queue | NULL                                   |
|  9 | root            | localhost | my_test | Sleep   |   197 |                        | NULL                                   |
| 10 | root            | localhost | my_test | Query   |     6 | updating               | update test set phone='123' where id=1 |
| 11 | root            | localhost | my_test | Query   |     0 | init                   | show processlist                       |
+----+-----------------+-----------+---------+---------+-------+------------------------+----------------------------------------+
4 rows in set (0.00 sec)

可以看到线程id为10的处于update阻塞中,同时info展示了正在阻塞的语句。

information_schema.INNODB_TRX:

mysql> select * from information_schema.INNODB_TRX;
+--------+-----------+---------------------+------------------------------+---------------------+------------+---------------------+----------------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+---------------------+
| trx_id | trx_state | trx_started         | trx_requested_lock_id        | trx_wait_started    | trx_weight | trx_mysql_thread_id | trx_query                              | trx_operation_state | trx_tables_in_use | trx_tables_locked | trx_lock_structs | trx_lock_memory_bytes | trx_rows_locked | trx_rows_modified | trx_concurrency_tickets | trx_isolation_level | trx_unique_checks | trx_foreign_key_checks | trx_last_foreign_key_error | trx_adaptive_hash_latched | trx_adaptive_hash_timeout | trx_is_read_only | trx_autocommit_non_locking | trx_schedule_weight |
+--------+-----------+---------------------+------------------------------+---------------------+------------+---------------------+----------------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+---------------------+
|  45579 | LOCK WAIT | 2022-04-23 15:38:19 | 5652286592:2:4:20:4471239192 | 2022-04-23 15:38:19 |          2 |                  10 | update test set phone='123' where id=1 | starting index read |                 1 |                 1 |                2 |                  1128 |               1 |                 0 |                       0 | REPEATABLE READ     |                 1 |                      1 | NULL                       |                         0 |                         0 |                0 |                          0 |                   1 |
|  45577 | RUNNING   | 2022-04-23 15:35:08 | NULL                         | NULL                |          3 |                   9 | NULL                                   | NULL                |                 0 |                 1 |                2 |                  1128 |               1 |                 1 |                       0 | REPEATABLE READ     |                 1 |                      1 | NULL                       |                         0 |                         0 |                0 |                          0 |                NULL |
+--------+-----------+---------------------+------------------------------+---------------------+------------+---------------------+----------------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+---------------------+
2 rows in set (0.00 sec)

可以看到trx_id为45579、trx_mysql_thread_id为10的事务处于锁等待中,同时也展示了执行的语句。我们来看看锁的持有和请求情况:

performance_schema.data_locks:

mysql> select * from performance_schema.data_locks;
+--------+------------------------------+-----------------------+-----------+----------+---------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
| ENGINE | ENGINE_LOCK_ID               | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE     | LOCK_STATUS | LOCK_DATA |
+--------+------------------------------+-----------------------+-----------+----------+---------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
| INNODB | 5652286592:1063:4688481592   |                 45579 |        51 |       16 | my_test       | test        | NULL           | NULL              | NULL       |            4688481592 | TABLE     | IX            | GRANTED     | NULL      |
| INNODB | 5652286592:2:4:20:4471239192 |                 45579 |        51 |       16 | my_test       | test        | NULL           | NULL              | PRIMARY    |            4471239192 | RECORD    | X,REC_NOT_GAP | WAITING     | 1         |
| INNODB | 5652287384:1063:4688483544   |                 45577 |        50 |       20 | my_test       | test        | NULL           | NULL              | NULL       |            4688483544 | TABLE     | IX            | GRANTED     | NULL      |
| INNODB | 5652287384:2:4:20:4471243800 |                 45577 |        50 |       20 | my_test       | test        | NULL           | NULL              | PRIMARY    |            4471243800 | RECORD    | X,REC_NOT_GAP | GRANTED     | 1         |
+--------+------------------------------+-----------------------+-----------+----------+---------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
4 rows in set (0.00 sec)

可以看到trx_id为45579正在等待id为1的行锁,行锁被trx_id为45577的事务持有。

解决

笔者一般遇到锁超时的时候是接口响应时间明显超常,一般遇到这样的问题是因为另外一个事务出现了IO阻塞或者等待或者处理其他逻辑耗时导致事务一直没有被提交。遇到这种异常情况可以先把占用锁超时的线程给kill掉,避免下次还会影响其他正常的事务。后续需要去分析占用超常时间的事务代码,是什么原因导致的,是一个大事务?事务中间有慢查询?事务中间进行了其他请求,对应的资源出现了请求超时?等等。

kill的话就找到长期占用锁的事务对应的线程id,如上面的事务trx_id为45577,线程id为9。当然这个只是临时解决,可以对这个事务对应的功能进行降级,处理完后在重新上线。

死锁

先后执行 事务1:

mysql> begin;update test set phone='123' where id=1;
Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

事务2:

mysql> begin;update user set phone ='123' where id=1;
Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

事务1:

mysql> update user set phone ='123' where id=1;
Query OK, 1 row affected (5.67 sec)
Rows matched: 1  Changed: 1  Warnings: 0

事务2:

mysql> update test set phone='123' where id=1;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

马上就监测出来死锁并回滚了。

解决

上面的例子其实就可以映射于实际开发中,需要尽量保证sql执行顺序的一致,首先就是方法的抽出,当然一些情况无法容易注意,最简单且不容易出错的方式就是加分布式锁。