Large Transaction Alarm Handling Method
Background
Large Transaction means the long-running transaction with a large number of data in operation. The large transaction brings a lot of risks, which is mainly attributable to two reasons below:
- Lock too much data, resulting in massive congestions and lock timeouts, and influencing other threads' normal execution to SQL statement.
- Long execution time easily leads to master-slave delay
The RDS adopts a large transaction monitoring item, "Maximum Transaction Execution Time", which is used in monitoring if there exists a large transaction. In the large transaction monitoring, the alarm policy is not added by default. So, the user needs to configure the alarm strategy according to their own needs.
The following focuses on how to respond to and deal with the possible large transaction monitoring.
References: Monitoring and Alarm Operations Guide
Problem Handling
Find problem
Channels for finding a large transaction include:
- On BCM, configure the monitoring policy for RDS's maximum transaction execution time. When reaching the alarm threshold, the system sends alarm information.
- View the monitoring trend chart on RDS instrument panel, and observe the maximum transaction execution time curve, as detailed below:
- If the time consumed rises during the database access, the yet-to-be-submitted large transaction probably blocks the execution of SQL on some threads.
Locate problem
- Step 1: Log in to the RDS instance using database account, execute the following command, view current process status, and check if there is an expected long-running SQL.
show processlist;For example, the following result is shown:
| 35620525 | db_user | ip:34880 | baidu_dba | Sleep  |  563 |           | NULL             |
| 35620617 | db_user | ip:35270 | NULL      | Query  |    0 | starting  | show processlist |- Step 2: print InnoDB core log. Recommend you to print it in the text file, facilitating the follow-up analysis.
SHOW ENGINE INNODB STATUS \G- Step 3: view the keywords in the file: ACTIVE
grep  -A2  ACTIVE  status.logFor example, the following result is shown:
---TRANSACTION 421530771110624, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 326595544, ACTIVE 565 sec
2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 35620525, OS thread handle 140055217084160, query id 400967592 127.0.0.1 db_user- Step 4: analysis. As mentioned above, find a transaction with an execution time of 565 seconds, and the transaction's thread ID matches the thread ID (35620525) viewed in Step 1.
Resolve problem
First, confirm if it is possible to roll back this transaction. If yes, log in to the RDS to execute "KILL" command:
KILL 35620525(threadID)At this moment, the client receives the following error information, which meets the expectation:
ERROR 2006 (HY000): MySQL server has gone away Observe the monitoring trend chart: the maximum transaction execution time returns to normal.
