Greenplum数据库故障分析——semop(id=2000421076,num=11) failed: invalid argument

网友投稿 310 2022-09-19

Greenplum数据库故障分析——semop(id=2000421076,num=11) failed: invalid argument

业务安装greenplum数据库时,master节点频繁崩溃,同事排查日志发现如下报错:

2022-06-23 21:01:21.201568 CST,,,p52171,th-761145216,,,0,,,seg-1,,,,,"FATAL","XX000","no free slots in PMChildFlags array",,,,,,,0,,,"pmsignal.c",173,"Stack trace:1 ...."

看上面的报错顿时感觉不可思议,PMChildFlags array中没有空闲的槽位,报错代码位于AssignPostmasterChildSlot函数中,详细流程参见 ​​PostgreSQL数据库PMsignal——后端进程\Postmaster信号通信​​​。连代码这行报错都说了​​Out of slots ... should never happen​​​。PMChildFlags槽从PMSignalShmemSize函数中可以看出其数量​​2 * (MaxConnections + autovacuum_max_workers + 1 + max_worker_processes)​​是远远大于MaxConnections,也就是业务连接在达到PMChildFlags槽数量上限前应该已经被MaxConnections限制住了;况且这个时候是数据库刚启动的时候,并没有业务连接进来。这就百思不得其姐了。。。

int AssignPostmasterChildSlot(void) { int slot = PMSignalState->next_child_flag; int n; /* Scan for a free slot. We track the last slot assigned so as not to waste time repeatedly rescanning low-numbered slots. */ for (n = PMSignalState->num_child_flags; n > 0; n--) { if (--slot < 0) slot = PMSignalState->num_child_flags - 1; if (PMSignalState->PMChildFlags[slot] == PM_CHILD_UNUSED) { PMSignalState->PMChildFlags[slot] = PM_CHILD_ASSIGNED; PMSignalState->next_child_flag = slot; return slot + 1; } } /* Out of slots ... should never happen, else postmaster.c messed up */ elog(FATAL, "no free slots in PMChildFlags array"); return 0; /* keep compiler quiet */}Size PMSignalShmemSize(void){ Size size; size = offsetof(PMSignalData, PMChildFlags); size = add_size(size, mul_size(MaxLivePostmasterChildren(), sizeof(sig_atomic_t))); return size;}int MaxLivePostmasterChildren(void) { return 2 * (MaxConnections + autovacuum_max_workers + 1 + max_worker_processes);}

没办法,由于和共享内存相关,直接从所有日志中抓取相关关键字,发现在某次重新安装的日志中出现如下问题,且是接在上述报错之后。通过度娘搜索到瀚高PG实验室有发过相关问题的博客,这里摘抄一下《​​semctl(156532736, 0, IPC_RMID, …) failed: Invalid argument 引起的数据库重启​​》。

[gpadmin@xxx pg_log] cat * | grep 'sem'2022-06-23 21:05:36.272884 CST,,,p66817,th-1473353856,,,0,con2,,seg-1,,,,,"FATAL","XX000","semop(id=2000421076,num=11) failed: invalid argument"

数据库日志没有规律性的出现如下所示报错,同时导致数据库重启。系统平台:Linux x86-64 Red Hat Enterprise Linux 7 版本:9.5

FATAL,XX000,semop(id=157450268) failed: Invalid argumentFATAL,XX000,semop(id=157843496) failed: Invalid argumentPANIC,XX000,queueing for lock while waiting on another oneterminating any other active server processesWARNING,57P02,terminating connection because of crash of another server process,The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.,In a moment you should be able to reconnect to the database and repeat your command.archiver process (PID 3766) exited with exit code 1FATAL,57P03,the database system is in recovery modeall server processes terminated; reinitializingcould not remove shared memory segment /PostgreSQL.44345806: No such file or directorysemctl(156532736, 0, IPC_RMID, ...) failed: Invalid argumentsemctl(156565505, 0, IPC_RMID, ...) failed: Invalid argumentFATAL,57P03,the database system is in recovery modedatabase system was interrupted; last known up at 2018-12-27 04:54:36 CSTdatabase system was not properly shut down; automatic recovery in progressredo starts at 5E7/7036BD30FATAL,57P03,the database system is in recovery modeinvalid record length at 5E7/75359EC8redo done at 5E7/75359EA0last completed transaction was at log time 2018-12-27 05:06:26.652179+08MultiXact member wraparound protections are now enabledautovacuum launcher starteddatabase system is ready to accept connections

造成该问题的原因为参数RemoveIPC被设置为yes。RemoveIPC参数在/etc/systemd/logind.conf中控制在用户完全注销时是否删除System V IPC对象。该参数在 systemd 212(2014-03-25)版本中默认打开,RHEL7从219版本开始。显然,RHEL7中的该参数是默认关闭的。当RemoveIPC = yes时,PostgreSQL服务器使用的信号量对象在随机时间被删除,导致服务器崩溃,出现类似的日志:​​LOG: semctl(1234567890, 0, IPC_RMID, ...) failed: Invalid argument​​。attached状态的共享内存段不会被清理,所以systemd不会清理正在被使用的共享内存段,但信号量没有进程attached的概念,所以即使它们实际上仍在使用中它们也会被清理干净。解决方案: (1) 修改“/etc/systemd/logind.conf”文件中的“RemoveIPC”字段的值为“no”。 使用vim打开logind.conf文件。​​​vim /etc/systemd/logind.conf​​​ 修改“RemoveIPC”字段的值为“no”。​​RemoveIPC=no​​ (2) 修改“/usr/lib/systemd/system/systemd-logind.service”文件中的“RemoveIPC” 字段的值为“no”。 使用vim命令打开systemd-logind.service文件。​​vim /usr/lib/systemd/system/systemd-logind.service​​ 修改“RemoveIPC”字段的值为“no”。​​RemoveIPC=no​​ (3) 重新加载配置参数。

systemctl daemon-reloadsystemctl restart systemd-logind

(4) 检查修改是否生效。

loginctl show-session | grep RemoveIPCsystemctl show systemd-logind | grep

但是修改之后依旧未能解决问题,现在只能自己分析,从​​PostgreSQL数据库信号量机制— PGSemaphore底层原理​​​文章中看出​​semop(id=2000421076,num=11) failed: invalid argument​​都是在PGSemaphoreLock、PGSemaphoreLockInterruptable、PGSemaphoreUnlock和PGSemaphoreTryLock中出现问题报错的,并且平台是使用的SysV semaphore facilities,其输入参数都是如下类型的变量,且PGSemaphoreData类型变量所使用的内存都是在共享内存中,也就是调用SysV库函数semget获取的semId和已获取信号量的数量就存在该PGSemaphoreData类型变量中。

#ifdef USE_SYSV_SEMAPHOREStypedef struct PGSemaphoreData{ int semId; /* semaphore set identifier */ int semNum; /* semaphore number within set */} PGSemaphoreData;#endif

PGSemaphoreCreate函数初始化PGSemaphore结构以表示计数为1的信号。但是PGSemaphoreCreate函数均是在postmaster守护进程中进行创建的,普通后台进程并没有权限去创建,因此创建过程没有报错就应该是没有问题的。

void PGSemaphoreCreate(PGSemaphore sema) { Assert(!IsUnderPostmaster); /* Can't do this in a backend, because static state is postmaster's */ if (nextSemaNumber >= SEMAS_PER_SET) { /* Time to allocate another semaphore set */ if (numSemaSets >= maxSemaSets) elog(PANIC, "too many semaphores created"); mySemaSets[numSemaSets] = IpcSemaphoreCreate(SEMAS_PER_SET); numSemaSets++; nextSemaNumber = 0; } sema->semId = mySemaSets[numSemaSets - 1]; /* Assign the next free semaphore in the current set */ sema->semNum = nextSemaNumber++; IpcSemaphoreInitialize(sema->semId, sema->semNum, 1); /* Initialize it to count 1 */}

因此最终问题在于我们的共享内存被操作系统处理掉了,参数RemoveIPC已经被排除了,现在就有两种可能:1. 被业务的脚本误伤,比如使用ipcrm,之前就有过被业务清理session的脚本误伤所有segement数据库进程的记录(可以通过死循环执行ipcs -a查看整个过程中master数据库信号量丢失的时间点和数据库崩溃的时间点来判定) 2. 公司linux发行版本系统对特定的系统用户有特殊设置(比如说master数据库进程是在session-72887.scope中,在user-1000.slice下,而segment数据库进程是在user-71381.slice下,1000是ubuntu的UID,是71381是gpadmin的UID,比如特定项目喜欢针对ubuntu账户进行特定的设置。

ipcs -a------ Shared Memory Segments --------key shmid owner perms bytes nattch status0x2369b4bc 3309575 gpadmin 640 7535067136 18------ Semaphore Arrays --------key semid owner perms nsems0xd5b0c6f8 1179650 gpadmin 640 154

夜深人静了,业务也不愿意把环境给我们继续调查下去了,遂放弃该节点部署到其他节点上去了。排查到这里,深感自己在浩瀚复杂的操作系统面前的无力感。

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:ETCD数据库源码分析——集群间网络层服务端接口
下一篇:读书笔记-第一本Docker书:Docker简介
相关文章

 发表评论

暂时没有评论,来抢沙发吧~