systemd-journald stopped logging on CentOS 7 (Failed to write entry)

Written by - 0 comments

Published on - Listed in Linux Systemd


On a production server running CentOS 7 I stumbled on a (new?) systemd problem that systemd's journald, which is responsible for system logging (and kind of replacing syslog...). I tried to analyze what happened on the system yesterday and couldn't find any logs. Neither in /var/log, nor anything in journalctl. 

The last logged entries according to journalctl were from February 18th, almost a month ago:

root@centos log]# journalctl -f
-- Logs begin at Mon 2016-02-08 15:33:13 CET. --
Feb 18 15:15:23 centos.local sudo[13290]:  ansible : TTY=pts/4 ; PWD=/home/ansible ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-nvxhcickrkxejyyxiyxddxidvnexliak; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ansible/.ansible/tmp/ansible-tmp-1455804928.67-234040995840251/user; rm -rf /home/ansible/.ansible/tmp/ansible-tmp-1455804928.67-234040995840251/ >/dev/null 2>&1
Feb 18 15:15:23 centos.local ansible-user[13292]: Invoked with comment=Application user User Test ssh_key_bits=2048 update_password=always non_unique=False force=False ssh_key_type=rsa ssh_key_passphrase=None home=None append=False uid=984 ssh_key_comment=ansible-generated on centos.local group=testuser system=False state=present shell=/bin/bash expires=None ssh_key_file=None groups=None move_home=False password=NOT_LOGGING_PASSWORD name=testuser createhome=True remove=False login_class=None generate_ssh_key=None
Feb 18 15:15:23 centos.local useradd[13295]: new user: name=testuser, UID=984, GID=984, home=/home/testuser, shell=/bin/bash
Feb 18 15:15:24 centos.local unknown[13301]: [audit ansible/11589 as ansible/13214 on pts/4/10.162.214.210:39712->10.162.215.53:22]: #=== bash session ended. ===
Feb 18 15:15:24 centos.local unknown[13370]: [audit ansible/11589 as ansible/13302 on pts/4/10.162.214.210:39712->10.162.215.53:22]: #=== New bash session started. ===
Feb 18 15:15:24 centos.local unknown[13379]: [audit ansible/11589 as ansible/13302 on pts/4/10.162.214.210:39712->10.162.215.53:22]: #=== bash session ended. ===
Feb 18 15:15:24 centos.local unknown[13448]: [audit / as ansible/13380 on /10.162.214.210:39712->10.162.215.53:22]: #=== New bash session started. ===
Feb 18 15:15:24 centos.local unknown[13455]: [audit / as ansible/13380 on /10.162.214.210:39712->10.162.215.53:22]: #=== bash session ended. ===
Feb 18 15:15:24 centos.local unknown[13524]: [audit ansible/11589 as ansible/13456 on pts/4/10.162.214.210:39712->10.162.215.53:22]: #=== New bash session started. ===
Feb 18 15:15:24 centos.local sudo[13532]:  ansible : TTY=pts/4 ; PWD=/home/ansible ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-ixocfinomkjmwqhxmdecoqaehteqxyed; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ansible/.ansible/tmp/ansible-tmp-1455804928.97-213542358801429/authorized_key; rm -rf /home/ansible/.ansible/tmp/ansible-tmp-1455804928.97-213542358801429/ >/dev/null 2>&1

After February 18th 15:15:24 no new entries were logged anymore. I mean in general no more logs; no logins, no kernel messages, no mail logs, nothing.
I checked if the system time was correct (it is) and also looked at the current uptime:

[root@centos log]# date
Thu Mar 17 08:42:55 CET 2016

[root@centos log]# uptime
 08:43:48 up 37 days, 17:09,  2 users,  load average: 0.00, 0.01, 0.05

The uptime is older than when the problem arose, so a reboot is definitely not the cause.

I suspected that maybe the systemd-journald process crashed. This would explain the immediate stop of logging. However systemctl showed that the "systemd-journald.service" is loaded and actively running:

[root@centos log]# systemctl list-units | egrep "(log|journal)"
rhel-dmesg.service                          loaded active exited    Dump dmesg to /var/log/dmesg
rsyslog.service                             loaded active running   System Logging Service
systemd-journal-flush.service               loaded active exited    Flush Journal to Persistent Storage
systemd-journald.service                    loaded active running   Journal Service
systemd-logind.service                      loaded active running   Login Service
systemd-journald.socket                     loaded active running   Journal Socket

A status check on systemd-journald.service shows that the service is up for 1 month and 7 days (and running):

[root@centos log]# systemctl status systemd-journald.service
● systemd-journald.service - Journal Service
   Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static; vendor preset: disabled)
   Active: active (running) since Mon 2016-02-08 15:33:17 CET; 1 months 7 days ago
     Docs: man:systemd-journald.service(8)
           man:journald.conf(5)
 Main PID: 361 (systemd-journal)
   Status: "Processing requests..."
   CGroup: /system.slice/systemd-journald.service
           └─361 /usr/lib/systemd/systemd-journald

Feb 08 15:33:17 centos.local systemd-journal[361]: Runtime journal is using 8.0M (max allowed 189.5M, trying to leave 284.3M free of 1.8G available → current limit 189.5M).
Feb 08 15:33:17 centos.local systemd-journal[361]: Runtime journal is using 8.0M (max allowed 189.5M, trying to leave 284.3M free of 1.8G available → current limit 189.5M).
Feb 08 15:33:17 centos.local systemd-journal[361]: Journal started
Feb 08 15:33:19 centos.local systemd-journal[361]: Runtime journal is using 8.0M (max allowed 189.5M, trying to leave 284.3M free of 1.8G available → current limit 189.5M).

 I checked dmesg to maybe catch a segfault or something similar but I caught this:

[root@centos log]# dmesg | tail
[3258033.227855] systemd-journald[361]: Failed to write entry (13 items, 318 bytes), ignoring: Cannot assign requested address
[3258089.234540] systemd-journald[361]: Failed to write entry (23 items, 605 bytes), ignoring: Cannot assign requested address
[3258089.236248] systemd-journald[361]: Failed to write entry (24 items, 636 bytes), ignoring: Cannot assign requested address
[3258089.238195] systemd-journald[361]: Failed to write entry (22 items, 618 bytes), ignoring: Cannot assign requested address
[3258093.202606] systemd-journald[361]: Failed to write entry (20 items, 481 bytes), ignoring: Cannot assign requested address
[3258093.209852] systemd-journald[361]: Failed to write entry (20 items, 517 bytes), ignoring: Cannot assign requested address
[3258093.212118] systemd-journald[361]: Failed to write entry (16 items, 370 bytes), ignoring: Cannot assign requested address
[3258095.090851] systemd-journald[361]: Failed to write entry (11 items, 295 bytes), ignoring: Cannot assign requested address
[3258105.616016] systemd-journald[361]: Failed to write entry (12 items, 314 bytes), ignoring: Cannot assign requested address
[3258105.618950] systemd-journald[361]: Failed to write entry (16 items, 395 bytes), ignoring: Cannot assign requested address

What the hell?! There is definitely something bad happening with systemd-journald! By reading the text I thought maybe the disk is full. But then my monitoring would have alerted me and a verification revealed all OK on /var:

root@centos log]# df -h | grep var
/dev/mapper/vgsystem-lvvar     ext4         3.9G  210M  3.4G   6% /var

So the problem is somewhere else...
Of course I tried to restart the service with "systemctl restart systemd-journald.service" - but nothing happened.
I then came across a forum post at linuxquestions.org which mentioned to run verify. And that's what I did:

[root@centos log]# journalctl --verify
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system.journal
454530: unused data (entry_offset==0)
454598: invalid object
File corruption detected at /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000001442e-00052bff5257e30c.journal:454598 (of 8388608 bytes, 54%).

FAIL: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000001442e-00052bff5257e30c.journal (Cannot assign requested address)
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000001242d-00052bdfb8cb8d83.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000010460-00052bbf48b22ed6.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000000d417-00052ba467e8f90a.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000009faf-00052b8afeaa9672.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000006b98-00052b7171ab2d26.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000003705-00052b5868f72dc3.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000000001-00052b4314b1902b.journal

 The lines appearing in red show quite obviously a file corruption.
After I ran the verify command, logging started working again:

[root@centos log]# journalctl -f
-- Logs begin at Mon 2016-02-08 15:33:13 CET. --
Mar 17 09:01:05 centos.local sshd[1643]: Set /proc/self/oom_score_adj to 0
Mar 17 09:01:05 centos.local sshd[1643]: Connection from 10.162.215.204 port 51034 on 10.162.215.53 port 22
Mar 17 09:01:05 centos.local sshd[1643]: Connection closed by 10.162.215.204 [preauth]
Mar 17 09:02:01 centos.local systemd[1]: Started Session 61871 of user root.
Mar 17 09:02:01 centos.local systemd[1]: Starting Session 61871 of user root.
Mar 17 09:02:01 centos.local CROND[1724]: (root) CMD (/app/scripts/idssrvjobs/idsjobindex)
Mar 17 09:02:05 centos.local sshd[1743]: Set /proc/self/oom_score_adj to 0
Mar 17 09:02:05 centos.local sshd[1743]: Connection from 10.162.215.204 port 51202 on 10.162.215.53 port 22
Mar 17 09:02:05 centos.local sshd[1743]: Connection closed by 10.162.215.204 [preauth]

Another run of --verify still showed the same corrupted journal file, so I deleted it and then ran verify again:

[root@centos log]# rm "/run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000001442e-00052bff5257e30c.journal"
rm: remove regular file ‘/run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000001442e-00052bff5257e30c.journal’? y

[root@centos log]# journalctl --verify
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000001242d-00052bdfb8cb8d83.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000010460-00052bbf48b22ed6.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000000d417-00052ba467e8f90a.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000009faf-00052b8afeaa9672.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000006b98-00052b7171ab2d26.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000003705-00052b5868f72dc3.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000000001-00052b4314b1902b.journal

This time all checks passed.

For over 10 years I never had to worry that Linux stops logging. And even if that happened, the syslog process could be monitored.
Here systemd told me that systemd-journald is running correctly although it clearly had a problem. As this is not my first systemd problem in a production environment, I avoid or ditch systemd wherever possible. On Debian and Ubuntu I replaced systemd by the stable SysV init system.  Unfortunately this system is a CentOS and systemd is very much bundled fix into the system. I just hope this was a one time problem, but I doubt that.

Update September 15th 2016:
More and more distributions have switched to systemd. I just experienced the same problem on a openSUSE 13.2 machine:

# journalctl --verify
Invalid tail monotonic timestamp██████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  49%
File corruption detected at /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-1002@e2b1911792f44ec2a5af6eef8cc2fdf1-00000000000cecf4-00053876db13a8e9.journal:000000 (of 8388608 bytes, 0%).
FAIL: /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-1002@e2b1911792f44ec2a5af6eef8cc2fdf1-00000000000cecf4-00053876db13a8e9.journal (Bad message)
Invalid tail monotonic timestamp██████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  49%
File corruption detected at /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-900@46649045ac2d45bc8543ab113c770566-0000000000064f0c-00052e4c2c98616c.journal:000000 (of 8388608 bytes, 0%).
FAIL: /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-900@46649045ac2d45bc8543ab113c770566-0000000000064f0c-00052e4c2c98616c.journal (Bad message)
Invalid tail monotonic timestamp██████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  49%
File corruption detected at /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-1000@b78fe5fd971048d6a4abf38182ad32ec-00000000000052c7-000522238f831f95.journal:000000 (of 8388608 bytes, 0%).
FAIL: /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-1000@b78fe5fd971048d6a4abf38182ad32ec-00000000000052c7-000522238f831f95.journal (Bad message)
Invalid tail monotonic timestamp██████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  49%
File corruption detected at /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-902@d37e827bc73b42bba5e732f37a4c6bdd-00000000000c5b03-000537962000a79a.journal:000000 (of 8388608 bytes, 0%).
FAIL: /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-902@d37e827bc73b42bba5e732f37a4c6bdd-00000000000c5b03-000537962000a79a.journal (Bad message)
 

So this is a general systemd-journald problem, not just one on CentOS 7.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder