PostgreSQL 小案例：tuple concurrently deleted

前言

今天在调代码调得正嗨的时候（没错又遇到了很早之前踩到的BUG：不同用户的执行计划不一样，之前记录过类似案例）在我调得一头雾水的时候，有位群友问了这么一个问题：

ERROR:   tuple concurrently deleted
大佬们，请教下，这个报错是怎么发生的，能怎么模拟出来吗

正好看烦了转移一下思维。乍一看报错：元组被并发删除了，正常来说元组被删除时会持有对应的行锁，另外一个事务要删除相同的行会被阻塞，那么什么情况下会报这种错？让我们试着分析一下。

复现

先按照常规思维来尝试复现一下

postgres=# create table t3(id int);
CREATE TABLE
postgres=# insert into t3 values(1);
INSERT 0 1
postgres=# begin;
BEGIN
postgres=*# delete from t3;
DELETE 1

新开一个会话也删除数据

postgres=# begin;
BEGIN
postgres=*# delete from t3;
---此处夯住

按照之前的分析，不出所料确实被阻塞了。网上倒是找到一个类似的案例：链接，不过这个案例是 ERROR：tuple concurrently updated，按照其中一个回答的解释是

你的普通子查询最多可获取 100 行，但不会锁定它们以防止写入访问。并发事务可以在 DELETE 锁定行之前更新或删除其中的一行或多行（至少使用默认隔离级别 READ COMMITTED）。这会导致你出现错误消息。

此处不去深挖这个细节，并且版本还是远古的 9.0。那让我们看看什么时候会出现 tuple concurrently deleted 的报错

/*
 * simple_heap_update - replace a tuple
 *
 * This routine may be used to update a tuple when concurrent updates of
 * the target tuple are not expected (for example, because we have a lock
 * on the relation associated with the tuple).  Any failure is reported
 * via ereport().
 */
void
simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup)
{
 TM_Result result;
 TM_FailureData tmfd;
 LockTupleMode lockmode;

 result = heap_update(relation, otid, tup,
       GetCurrentCommandId(true), InvalidSnapshot,
       true /* wait for commit */ ,
       &tmfd, &lockmode);
 switch (result)
 {
  case TM_SelfModified:
   /* Tuple was already updated in current command? */
   elog(ERROR, "tuple already updated by self");
   break;

  case TM_Ok:
   /* done successfully */
   break;

  case TM_Updated:
   elog(ERROR, "tuple concurrently updated");
   break;

  case TM_Deleted:
   elog(ERROR, "tuple concurrently deleted");
   break;

  default:
   elog(ERROR, "unrecognized heap_update status: %u", result);
   break;
 }
}

TM_Deleted 表示已被其它事务给删除了，这个函数位于 heapam.c 中，而 heapam.c 中主要实现了表的打开、关闭、删除、扫描等操作（am 是 access method 的缩写）。

结合这段注释

for example, because we have a lock on the relation associated with the tuple

那让我们来尝试一下，对表删除和加锁这类操作

postgres=# create table t3(id int);
CREATE TABLE
postgres=# insert into t3 values(1);
INSERT 0 1
postgres=# begin;
BEGIN
postgres=*# drop table t3;
DROP TABLE

新开一个会话进行删除

postgres=# begin;
BEGIN
postgres=*# delete from t3;
---夯住

但是这种情况下提示的是表不存在

postgres=*# delete from t3;
ERROR:  relation "t3" does not exist
LINE 1: delete from t3;
                    ^

仔细一想确实如此，删除操作会加 8 级锁，阻塞一切访问。这个方式行不通，那让我们试下系统表，涉及到系统表的变更无外乎表结构变更、授权等。果然，当我测试到授权的时候，就稳定复现了

postgres=# begin;
BEGIN
postgres=*# drop table t3;
DROP TABLE

postgres=# begin;
BEGIN
postgres=*# grant select on t3 to u1;
ERROR:  tuple concurrently deleted
postgres=!# rollback ;
ROLLBACK
postgres=# \errverbose 
ERROR:  XX000: tuple concurrently deleted
LOCATION:  simple_heap_update, heapam.c:4191

因为授权这个操作会去修改 pg_class.relacl 字段。

小结

通过一个小案例，向各位分享一下分析问题的简单思路。至于不同用户的执行计划不同，待我分析出来再分享，出现在分区表的场景下，解决方式很简单：授予子表查询权限。

PostgreSQL 小案例：tuple concurrently deleted

前言

复现

小结

最热文章