Mysql锁超时&死锁定位及解决
前言
除了slow sql的处理,常见的还有锁超时和死锁的定位。
锁超时一般发生在两个update对同一行或者存在间隙锁的交集情况下,在业务中一个事务持有锁之后因为发生了IO阻塞或者等待或者处理其他逻辑耗时导致事务一直没有被提交,锁得不到释放。
死锁时一般发生在两个事务分别先后持有两个不同的锁。比如事务1先持有A锁,事务2先持有B锁,事务1后准备持有B锁,事务2后持有A锁,那么这两个事务后面打算持有的锁都被对方先持有了,那么就出现了死锁。
下面我们将对锁超时和死锁解释下怎么定位和解决,mysql版本为:
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.27 |
+-----------+
1 row in set (0.00 sec)
测试表结构为:
| Table | Create Table |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| test | CREATE TABLE `test` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
`name` char(32) NOT NULL COMMENT '用户姓名',
`num` int DEFAULT NULL,
`phone` char(11) DEFAULT '' COMMENT '手机号',
PRIMARY KEY (`id`),
KEY `idx_name_phone` (`name`,`phone`)
) ENGINE=InnoDB AUTO_INCREMENT=106 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='test表' |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.04 sec)
| Table | Create Table |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| user | CREATE TABLE `user` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
`create_time` datetime(3) DEFAULT CURRENT_TIMESTAMP(3) COMMENT '创建时间',
`update_time` datetime(3) DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3) COMMENT '更新时间',
`is_delete` tinyint NOT NULL DEFAULT '0' COMMENT '是否删除 0:否 1:是',
`phone` char(11) DEFAULT '' COMMENT '手机号',
`user_name` char(64) NOT NULL COMMENT '用户名',
`head` varchar(255) DEFAULT NULL COMMENT '用户头像',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='用户表' |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
提前需要了解的
关于Mysql InnoDB的各类锁请参考《Mysql InnoDB之各类锁》,里面有介绍怎么查看持有锁和请求锁的情况可以使用select * from performance_schema.data_locks
进行查看这里就不做过多描述。查看有哪些线程正在执行:show processlist
。查看当前运行的所有事务:select * from information_schema.INNODB_TRX
。
performance_schema:你可以查询performance_schema数据库中的表,以查看有关服务器及其运行的应用程序的性能特征的实时信息。有关详细信息,请参阅 第 27 章,MySQL 性能模式。
information_schema:您可以使用InnoDB information_schema表提取有关InnoDB管理的架构对象的元数据。
锁超时
事务1:
mysql> begin;update test set phone='123' where id=1;
Query OK, 0 rows affected (0.00 sec)
事务2:
mysql> begin;update test set phone='123' where id=1;
Query OK, 0 rows affected (0.00 sec)
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
show processlist:
mysql> show processlist;
+----+-----------------+-----------+---------+---------+-------+------------------------+----------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-----------------+-----------+---------+---------+-------+------------------------+----------------------------------------+
| 5 | event_scheduler | localhost | NULL | Daemon | 13118 | Waiting on empty queue | NULL |
| 9 | root | localhost | my_test | Sleep | 197 | | NULL |
| 10 | root | localhost | my_test | Query | 6 | updating | update test set phone='123' where id=1 |
| 11 | root | localhost | my_test | Query | 0 | init | show processlist |
+----+-----------------+-----------+---------+---------+-------+------------------------+----------------------------------------+
4 rows in set (0.00 sec)
可以看到线程id为10的处于update阻塞中,同时info展示了正在阻塞的语句。
information_schema.INNODB_TRX:
mysql> select * from information_schema.INNODB_TRX;
+--------+-----------+---------------------+------------------------------+---------------------+------------+---------------------+----------------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+---------------------+
| trx_id | trx_state | trx_started | trx_requested_lock_id | trx_wait_started | trx_weight | trx_mysql_thread_id | trx_query | trx_operation_state | trx_tables_in_use | trx_tables_locked | trx_lock_structs | trx_lock_memory_bytes | trx_rows_locked | trx_rows_modified | trx_concurrency_tickets | trx_isolation_level | trx_unique_checks | trx_foreign_key_checks | trx_last_foreign_key_error | trx_adaptive_hash_latched | trx_adaptive_hash_timeout | trx_is_read_only | trx_autocommit_non_locking | trx_schedule_weight |
+--------+-----------+---------------------+------------------------------+---------------------+------------+---------------------+----------------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+---------------------+
| 45579 | LOCK WAIT | 2022-04-23 15:38:19 | 5652286592:2:4:20:4471239192 | 2022-04-23 15:38:19 | 2 | 10 | update test set phone='123' where id=1 | starting index read | 1 | 1 | 2 | 1128 | 1 | 0 | 0 | REPEATABLE READ | 1 | 1 | NULL | 0 | 0 | 0 | 0 | 1 |
| 45577 | RUNNING | 2022-04-23 15:35:08 | NULL | NULL | 3 | 9 | NULL | NULL | 0 | 1 | 2 | 1128 | 1 | 1 | 0 | REPEATABLE READ | 1 | 1 | NULL | 0 | 0 | 0 | 0 | NULL |
+--------+-----------+---------------------+------------------------------+---------------------+------------+---------------------+----------------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+---------------------+
2 rows in set (0.00 sec)
可以看到trx_id为45579、trx_mysql_thread_id为10的事务处于锁等待中,同时也展示了执行的语句。我们来看看锁的持有和请求情况:
performance_schema.data_locks:
mysql> select * from performance_schema.data_locks;
+--------+------------------------------+-----------------------+-----------+----------+---------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+------------------------------+-----------------------+-----------+----------+---------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
| INNODB | 5652286592:1063:4688481592 | 45579 | 51 | 16 | my_test | test | NULL | NULL | NULL | 4688481592 | TABLE | IX | GRANTED | NULL |
| INNODB | 5652286592:2:4:20:4471239192 | 45579 | 51 | 16 | my_test | test | NULL | NULL | PRIMARY | 4471239192 | RECORD | X,REC_NOT_GAP | WAITING | 1 |
| INNODB | 5652287384:1063:4688483544 | 45577 | 50 | 20 | my_test | test | NULL | NULL | NULL | 4688483544 | TABLE | IX | GRANTED | NULL |
| INNODB | 5652287384:2:4:20:4471243800 | 45577 | 50 | 20 | my_test | test | NULL | NULL | PRIMARY | 4471243800 | RECORD | X,REC_NOT_GAP | GRANTED | 1 |
+--------+------------------------------+-----------------------+-----------+----------+---------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
4 rows in set (0.00 sec)
可以看到trx_id为45579正在等待id为1的行锁,行锁被trx_id为45577的事务持有。
解决
笔者一般遇到锁超时的时候是接口响应时间明显超常,一般遇到这样的问题是因为另外一个事务出现了IO阻塞或者等待或者处理其他逻辑耗时导致事务一直没有被提交。遇到这种异常情况可以先把占用锁超时的线程给kill掉,避免下次还会影响其他正常的事务。后续需要去分析占用超常时间的事务代码,是什么原因导致的,是一个大事务?事务中间有慢查询?事务中间进行了其他请求,对应的资源出现了请求超时?等等。
kill的话就找到长期占用锁的事务对应的线程id,如上面的事务trx_id为45577,线程id为9。当然这个只是临时解决,可以对这个事务对应的功能进行降级,处理完后在重新上线。
死锁
先后执行 事务1:
mysql> begin;update test set phone='123' where id=1;
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
事务2:
mysql> begin;update user set phone ='123' where id=1;
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
事务1:
mysql> update user set phone ='123' where id=1;
Query OK, 1 row affected (5.67 sec)
Rows matched: 1 Changed: 1 Warnings: 0
事务2:
mysql> update test set phone='123' where id=1;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
马上就监测出来死锁并回滚了。
解决
上面的例子其实就可以映射于实际开发中,需要尽量保证sql执行顺序的一致,首先就是方法的抽出,当然一些情况无法容易注意,最简单且不容易出错的方式就是加分布式锁。
转载自:https://juejin.cn/post/7089835905325727775