Let's start this post with a love declaration: We love Linux packages. Whether these are Debian based deb packages, rpm's or other formats, installation of additional software is easy and fast. With the help of package managers such as apt, yum, zypper and others, it's even easier and dependencies are installed with the package. Great.
Some of the packages are very easy to maintain. They contain a single binary, maybe a few man pages and maybe a configuration file. That's it. As soon as upstream changes something in the source code, the maintainer can build a new package.
But there are packages with a much higher complexity around. Apache and Nginx are such packages. They include multiple modules where each module is itself its own open source project. These "sub-projects" may change fast and dependencies on core versions might change, too. To oversee all this requires a lot of dedication and time. That's not easy - especially in the open source community where a lot of work is done benevolent.
In this particular example we stumbled on a problem between Debian's Nginx 1.14 package and the embedded nchan module - and dedicated a blog post for it. Important: This is not a blame post! We want to show technical reasons why this problem occurred and present a workaround to handle the problem.
Our monitoring alerted us, that a container running Debian Buster (the current Debian Stable as of this writing) was going crazy and disk space quickly started to fill up:
After a quick verification it turned out that the Nginx error log was logging thousands of entries:
2020/11/13 13:45:10 [alert] 8368#8368: worker process 21797 exited on signal 6
2020/11/13 13:45:10 [alert] 8368#8368: shared memory zone "memstore" was locked by 21797
ter process /usr/sbin/nginx -g daemon on; master_process on;: /build/nginx-Cjs4TR/nginx-1.14.2/debian/modules/nchan/src/store/memory/memstore.c:701: nchan_store_init_worker: Assertion `procslot_found == 1' failed.
2020/11/13 13:45:10 [alert] 8368#8368: worker process 21798 exited on signal 6
2020/11/13 13:45:10 [alert] 8368#8368: shared memory zone "memstore" was locked by 21798
ter process /usr/sbin/nginx -g daemon on; master_process on;: /build/nginx-Cjs4TR/nginx-1.14.2/debian/modules/nchan/src/store/memory/memstore.c:701: nchan_store_init_worker: Assertion `procslot_found == 1' failed.
2020/11/13 13:45:10 [alert] 8368#8368: worker process 21799 exited on signal 6
2020/11/13 13:45:10 [alert] 8368#8368: shared memory zone "memstore" was locked by 21799
ter process /usr/sbin/nginx -g daemon on; master_process on;: /build/nginx-Cjs4TR/nginx-1.14.2/debian/modules/nchan/src/store/memory/memstore.c:701: nchan_store_init_worker: Assertion `procslot_found == 1' failed.
2020/11/13 13:45:10 [alert] 8368#8368: worker process 21800 exited on signal 6
2020/11/13 13:45:10 [alert] 8368#8368: shared memory zone "memstore" was locked by 21800
ter process /usr/sbin/nginx -g daemon on; master_process on;: /build/nginx-Cjs4TR/nginx-1.14.2/debian/modules/nchan/src/store/memory/memstore.c:701: nchan_store_init_worker: Assertion `procslot_found == 1' failed.
2020/11/13 13:45:10 [alert] 8368#8368: worker process 21801 exited on signal 6
2020/11/13 13:45:10 [alert] 8368#8368: shared memory zone "memstore" was locked by 21801
ter process /usr/sbin/nginx -g daemon on; master_process on;: /build/nginx-Cjs4TR/nginx-1.14.2/debian/modules/nchan/src/store/memory/memstore.c:701: nchan_store_init_worker: Assertion `procslot_found == 1' failed.
The same errors were logged over and over again, just with different worker PIDs. According to these logged events, the errors seem to come from the nchan module. As the disk was filling up very quickly, we needed to find a way to solve this - fast.
Research led to issue #446 on the nchan repository. The relevant hint came from the nchan maintainer (slact):
The Debian repo is very far behind on Nchan updates, I think something like this issue was fixed over a year ago. If you don't need to use Nchan, just unload it.
As we don't use any of the nchan features in this particular setup, we decided to just do this as suggested: Unload the (dynamic) module and restart Nginx. This is quickly done:
root@nginx:~# ll /etc/nginx/modules-enabled/|grep nchan
lrwxrwxrwx 1 root root 49 Apr 20 2020 50-mod-nchan.conf -> /usr/share/nginx/modules-available/mod-nchan.conf
root@nginx:~# rm /etc/nginx/modules-enabled/50-mod-nchan.conf
rm: remove symbolic link '/etc/nginx/modules-enabled/50-mod-nchan.conf'? y
root@nginx:~# systemctl restart nginx
No more errors were logged after this.
Based on slact's comment, the obvious breadcrumb is to find the version of nchan itself. But as nchan is bundled with the Nginx package, dpkg is not helpful and only shows the version of the main Nginx package:
root@nginx:~# dpkg -l|grep nchan
ii libnginx-mod-nchan 1.14.2-2+deb10u3 amd64 Fast, flexible pub/sub server for Nginx
To find out which module version was used, the debian build package of the Nginx package needs to be downloaded and unpacked:
ckadm@mintp ~/build $ wget http://deb.debian.org/debian/pool/main/n/nginx/nginx_1.14.2-2+deb10u3.debian.tar.xz
ckadm@mintp ~/build $ xz -d nginx_1.14.2-2+deb10u3.debian.tar.xz
ckadm@mintp ~/build $ tar -xf nginx_1.14.2-2+deb10u3.debian.tar
The nchan module can be found in the extracted directory debian/modules/nchan. Inside this directory is not only the module's source code but also the changelog. Let's take a closer look at this:
ckadm@mintp ~/build $ head debian/modules/nchan/changelog.txt
1.0.8 (Nov. 28 2016)
fix: possible crash under severely heavy load, introduced in 1.0.7 with stack-overflow fix
1.0.7 (Nov. 27 2016)
fix: memory leak after websocket publisher uncleanly aborts connection
fix: misbehaving websocket publisher with nchan_publisher_upstream_request
fix: potential stack overflow with very large message buffers
fix: invalid memory access with empty nchan_publisher_upstream_request for websocket publisher
fix: incorrect handling of chunked response from nchan_publisher_upstream_request
fix: publishing through websocket too fast may result in buffered messages that never arrive
fix: DELETE to multiplexed channel should delete all listed channels
So this shows that nchan version 1.0.8 (from 2016) was used to build this Nginx 1.14.2 package. However there is one big problem with this version combination. On GitHub issue #460 the module maintainer mentions the following:
The Nginx internals changed around a little in 1.13.10: http://hg.nginx.org/nginx/rev/20f139e9ffa8
And also the nchan changelog mentions a couple of compatibility problems with Nginx > 1.13.10:
1.2.1 (Aug. 2 2018)
fix: authentication failure body not forwarded for Nginx > 1.13.10
1.2.0 (Jul. 23 2018)
fix: Publisher upstream compatibility for Nginx > 1.13.10
This means that any Nginx 1.13 and above should use nchan module with at least version 1.2.1 or newer. And this unfortunately also means that the current Debian Stable (Debian Buster) is still shipping an incompatible nchan module within the Nginx package.
For the upcoming Debian release (static codename: sid) however, this problem seems to be solved. According to Debian's Nginx repository on their GitLab, nchan was upgraded to 1.2.7 (see commit). However as of today, the Nginx package in the current Debian Stable is still bundled with an incompatible nchan module version.
If you are using Debian Buster and need Nginx with the nchan module, you should (re-)build the deb package yourself.
username from wrote on Dec 10th, 2022:
Thank You for this article. I have same problem.
yetzt from wrote on Oct 15th, 2021:
i had the same symptoms on ubuntu 20.04, nginx 1.17.10 and nchan 1.2.7
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder