A central syslog server, running syslog-ng in a LXC (system) container, stopped working. A quick check into the local syslog logs revealed that the file system seems to be full:
Aug 24 10:06:01 syslog syslog-ng[42]: Error suspend timeout has elapsed, attempting to write again; fd='26'
Aug 24 10:06:01 syslog syslog-ng[42]: I/O error occurred while writing; fd='26', error='No space left on device (28)'
Aug 24 10:06:01 syslog syslog-ng[42]: Suspending write operation because of an I/O error; fd='26', time_reopen='60'
Yet when checking the container's file system, there was still plenty of space available:
ckadm@syslog ~ $ df -h /
Filesystem Type Size Used Avail Use% Mounted on
/dev/vgdata/syslog ext4 216G 192G 24G 90% /
ckadm@syslog ~ $ df -i /
Filesystem Type Inodes IUsed IFree IUse% Mounted on
/dev/vgdata/syslog ext4 14M 100K 14M 1% /
But df revealed that another partition was fully used: /dev !
root@syslog ~ # df -h /dev
Filesystem Type Size Used Avail Use% Mounted on
none tmpfs 492K 492K 0 100% /dev
There are a couple of explanations why this happened.
The partition /dev in a LXC container is (by default) automatically created when the container is started. From the documentation:
By default, lxc creates a few symbolic links (fd,stdin,stdout,stderr) in the container's /dev directory but does not automatically create device node entries. This allows the container's /dev to be set up as needed in the container rootfs. If lxc.autodev is set to 1, then after mounting the container's rootfs LXC will mount a fresh tmpfs under /dev (limited to 500K by default, unless defined in lxc.autodev.tmpfs.size) and fill in a minimal set of initial devices. This is generally required when starting a container containing a "systemd" based "init" but may be optional at other times.
Another important hint is shown here: The default size of /dev is 500K (shown as 492K in df). This is not a big size, agreed, but usually this should be enough as this tmpfs filesystem should only contain some symbolic links or device nodes. But as soon as "real" files are created within this path, by error or on purpose, this will quickly result in problems.
Note: Since LXC 4.0 it is possible to define a bigger size than 500K using the LXC config option lxc.autodev.tmpfs.size (as mentioned in the quote). We have provided the relevant code change in the LXC project.
This question can easily be answered by looking at the logged errors from above again:
Aug 24 10:06:01 syslog syslog-ng[42]: Error suspend timeout has elapsed, attempting to write again; fd='26'
Aug 24 10:06:01 syslog syslog-ng[42]: I/O error occurred while writing; fd='26', error='No space left on device (28)'
Aug 24 10:06:01 syslog syslog-ng[42]: Suspending write operation because of an I/O error; fd='26', time_reopen='60'
Syslog-NG complains that it cannot write to fd (file descriptor) 26. Now what exactly is this fd 26? The real path can be revealed by using the /proc filesystem, using the PID of syslog-ng:
root@syslog ~ # pgrep syslog-ng
13072
root@syslog ~ # ls -l /proc/13072/fd/26
l-wx------ 1 root root 64 Aug 23 09:15 /proc/13072/fd/26 -> /dev/tty10
Obviously fd 26 points to /dev/tty10.
This tty10 can also be found in syslog-ng's config:
root@syslog ~ # grep tty10 /etc/syslog-ng/syslog-ng.conf
destination d_console_all { file(`tty10`); };
In this case the syslog-ng configuration defines a destination d_console_all to use tty10 - as console output for logging. However there's a small "problem" with tty's in containers.
This article won't be explaining what a TTY is but if you want to know more, the "The TTY demystified" article from Linus Akesson is a great read!
When a LXC container is started, it will automatically create a tmpfs under /dev, as mentioned above. This also includes tty devices, which are used (and needed) for user interaction such as SSH input/output. The number of tty devices is configurable with the configuration option lxc.tty in LXC < 3.0 and lxc.tty.max in LXC >= 3.0.
Default configurations of LXC containers usually include "base configurations". Here on a Debian 9 (stretch) running an older LXC 2.x, the container's config file includes a common configuration file adapted for Debian systems:
root@lxchost ~ # egrep "^lxc.include" -B 1 /var/lib/lxc/syslog/config
# Common configuration
lxc.include = /usr/share/lxc/config/debian.common.conf
By looking at this included configuration file, yet another config file is included:
root@lxchost ~ # cat /usr/share/lxc/config/debian.common.conf
# This derives from the global common config
lxc.include = /usr/share/lxc/config/common.conf
[...]
And finally inside this common.conf configuration file, the numbers of tty's are defined:
root@lxchost ~ # grep tty /usr/share/lxc/config/common.conf
lxc.devttydir = lxc
# Setup 4 tty devices
lxc.tty = 4
### /dev/tty
This means that (unless overwritten in the container's config file), 4 tty devices are created inside the container's /dev filesystem:
root@syslog ~ # ll /dev/tt*
crw-rw-rw- 1 root root 5, 0 Aug 21 2018 /dev/tty
crw--w---- 1 root tty 136, 0 Aug 21 2018 /dev/tty1
-rw-r----- 1 root adm 503808 Aug 23 09:21 /dev/tty10
crw--w---- 1 root tty 136, 1 Aug 21 2018 /dev/tty2
crw--w---- 1 root tty 136, 2 Aug 21 2018 /dev/tty3
crw--w---- 1 root tty 136, 3 Aug 21 2018 /dev/tty4
Now as you can clearly see, there are tty devices tty[1-4] but there's another one: tty10. Just looking at the permission of tty10 shows that this is not a special device, it is a file! And this file is created and written by syslog-ng - because the syslog-ng configuration tells syslog-ng to write the console output to this path. Syslog-NG does not verify if this is a special device node or a file, it just writes into it... until /dev is filled up (which happens pretty quickly given the 500K capacity).
As we now know that there is no /dev/tty10 device in the container, syslog-ng's configuration needs to be adjusted. To use one of the existing TTY's the path for d_console_all needs to be adjusted:
root@syslog ~ # grep tty10 -A 1 -B 1 /etc/syslog-ng/syslog-ng.conf
#destination d_console_all { file(`tty10`); };
destination d_console_all { file("/dev/tty2"); };
Here the path was set to "/dev/tty2". After a restart of syslog-ng and a clean up of the regular file /dev/tty10, the /dev filesystem was usable again and syslog-ng continued to smoothly collect and store logs.
root@syslog ~ # systemctl stop syslog-ng
root@syslog ~ # rm /dev/tty10
rm: remove regular file '/dev/tty10'? y
root@syslog ~ # systemctl start syslog-ng
root@syslog ~ # df -h /dev/
Filesystem Type Size Used Avail Use% Mounted on
none tmpfs 492K 0 492K 0% /dev
Gandalf from France wrote on Feb 28th, 2021:
Thanks for the tip...
I was issuing this problems inside my LXC containers...
now fixed ! ;-)
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder