On a production server running CentOS 7 I stumbled on a (new?) systemd problem that systemd's journald, which is responsible for system logging (and kind of replacing syslog...). I tried to analyze what happened on the system yesterday and couldn't find any logs. Neither in /var/log, nor anything in journalctl.
The last logged entries according to journalctl were from February 18th, almost a month ago:
root@centos log]# journalctl -f
-- Logs begin at Mon 2016-02-08 15:33:13 CET. --
Feb 18 15:15:23 centos.local sudo[13290]: ansible : TTY=pts/4 ; PWD=/home/ansible ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-nvxhcickrkxejyyxiyxddxidvnexliak; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ansible/.ansible/tmp/ansible-tmp-1455804928.67-234040995840251/user; rm -rf /home/ansible/.ansible/tmp/ansible-tmp-1455804928.67-234040995840251/ >/dev/null 2>&1
Feb 18 15:15:23 centos.local ansible-user[13292]: Invoked with comment=Application user User Test ssh_key_bits=2048 update_password=always non_unique=False force=False ssh_key_type=rsa ssh_key_passphrase=None home=None append=False uid=984 ssh_key_comment=ansible-generated on centos.local group=testuser system=False state=present shell=/bin/bash expires=None ssh_key_file=None groups=None move_home=False password=NOT_LOGGING_PASSWORD name=testuser createhome=True remove=False login_class=None generate_ssh_key=None
Feb 18 15:15:23 centos.local useradd[13295]: new user: name=testuser, UID=984, GID=984, home=/home/testuser, shell=/bin/bash
Feb 18 15:15:24 centos.local unknown[13301]: [audit ansible/11589 as ansible/13214 on pts/4/10.162.214.210:39712->10.162.215.53:22]: #=== bash session ended. ===
Feb 18 15:15:24 centos.local unknown[13370]: [audit ansible/11589 as ansible/13302 on pts/4/10.162.214.210:39712->10.162.215.53:22]: #=== New bash session started. ===
Feb 18 15:15:24 centos.local unknown[13379]: [audit ansible/11589 as ansible/13302 on pts/4/10.162.214.210:39712->10.162.215.53:22]: #=== bash session ended. ===
Feb 18 15:15:24 centos.local unknown[13448]: [audit / as ansible/13380 on /10.162.214.210:39712->10.162.215.53:22]: #=== New bash session started. ===
Feb 18 15:15:24 centos.local unknown[13455]: [audit / as ansible/13380 on /10.162.214.210:39712->10.162.215.53:22]: #=== bash session ended. ===
Feb 18 15:15:24 centos.local unknown[13524]: [audit ansible/11589 as ansible/13456 on pts/4/10.162.214.210:39712->10.162.215.53:22]: #=== New bash session started. ===
Feb 18 15:15:24 centos.local sudo[13532]: ansible : TTY=pts/4 ; PWD=/home/ansible ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-ixocfinomkjmwqhxmdecoqaehteqxyed; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ansible/.ansible/tmp/ansible-tmp-1455804928.97-213542358801429/authorized_key; rm -rf /home/ansible/.ansible/tmp/ansible-tmp-1455804928.97-213542358801429/ >/dev/null 2>&1
After February 18th 15:15:24 no new entries were logged anymore. I mean in general no more logs; no logins, no kernel messages, no mail logs, nothing.
I checked if the system time was correct (it is) and also looked at the current uptime:
[root@centos log]# date
Thu Mar 17 08:42:55 CET 2016
[root@centos log]# uptime
08:43:48 up 37 days, 17:09, 2 users, load average: 0.00, 0.01, 0.05
The uptime is older than when the problem arose, so a reboot is definitely not the cause.
I suspected that maybe the systemd-journald process crashed. This would explain the immediate stop of logging. However systemctl showed that the "systemd-journald.service" is loaded and actively running:
[root@centos log]# systemctl list-units | egrep "(log|journal)"
rhel-dmesg.service loaded active exited Dump dmesg to /var/log/dmesg
rsyslog.service loaded active running System Logging Service
systemd-journal-flush.service loaded active exited Flush Journal to Persistent Storage
systemd-journald.service loaded active running Journal Service
systemd-logind.service loaded active running Login Service
systemd-journald.socket loaded active running Journal Socket
A status check on systemd-journald.service shows that the service is up for 1 month and 7 days (and running):
[root@centos log]# systemctl status systemd-journald.service
● systemd-journald.service - Journal Service
Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static; vendor preset: disabled)
Active: active (running) since Mon 2016-02-08 15:33:17 CET; 1 months 7 days ago
Docs: man:systemd-journald.service(8)
man:journald.conf(5)
Main PID: 361 (systemd-journal)
Status: "Processing requests..."
CGroup: /system.slice/systemd-journald.service
└─361 /usr/lib/systemd/systemd-journald
Feb 08 15:33:17 centos.local systemd-journal[361]: Runtime journal is using 8.0M (max allowed 189.5M, trying to leave 284.3M free of 1.8G available → current limit 189.5M).
Feb 08 15:33:17 centos.local systemd-journal[361]: Runtime journal is using 8.0M (max allowed 189.5M, trying to leave 284.3M free of 1.8G available → current limit 189.5M).
Feb 08 15:33:17 centos.local systemd-journal[361]: Journal started
Feb 08 15:33:19 centos.local systemd-journal[361]: Runtime journal is using 8.0M (max allowed 189.5M, trying to leave 284.3M free of 1.8G available → current limit 189.5M).
I checked dmesg to maybe catch a segfault or something similar but I caught this:
[root@centos log]# dmesg | tail
[3258033.227855] systemd-journald[361]: Failed to write entry (13 items, 318 bytes), ignoring: Cannot assign requested address
[3258089.234540] systemd-journald[361]: Failed to write entry (23 items, 605 bytes), ignoring: Cannot assign requested address
[3258089.236248] systemd-journald[361]: Failed to write entry (24 items, 636 bytes), ignoring: Cannot assign requested address
[3258089.238195] systemd-journald[361]: Failed to write entry (22 items, 618 bytes), ignoring: Cannot assign requested address
[3258093.202606] systemd-journald[361]: Failed to write entry (20 items, 481 bytes), ignoring: Cannot assign requested address
[3258093.209852] systemd-journald[361]: Failed to write entry (20 items, 517 bytes), ignoring: Cannot assign requested address
[3258093.212118] systemd-journald[361]: Failed to write entry (16 items, 370 bytes), ignoring: Cannot assign requested address
[3258095.090851] systemd-journald[361]: Failed to write entry (11 items, 295 bytes), ignoring: Cannot assign requested address
[3258105.616016] systemd-journald[361]: Failed to write entry (12 items, 314 bytes), ignoring: Cannot assign requested address
[3258105.618950] systemd-journald[361]: Failed to write entry (16 items, 395 bytes), ignoring: Cannot assign requested address
What the hell?! There is definitely something bad happening with systemd-journald! By reading the text I thought maybe the disk is full. But then my monitoring would have alerted me and a verification revealed all OK on /var:
root@centos log]# df -h | grep var
/dev/mapper/vgsystem-lvvar ext4 3.9G 210M 3.4G 6% /var
So the problem is somewhere else...
Of course I tried to restart the service with "systemctl restart systemd-journald.service" - but nothing happened.
I then came across a forum post at linuxquestions.org which mentioned to run verify. And that's what I did:
[root@centos log]# journalctl --verify
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system.journal
454530: unused data (entry_offset==0)
454598: invalid object
File corruption detected at /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000001442e-00052bff5257e30c.journal:454598 (of 8388608 bytes, 54%).
FAIL: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000001442e-00052bff5257e30c.journal (Cannot assign requested address)
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000001242d-00052bdfb8cb8d83.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000010460-00052bbf48b22ed6.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000000d417-00052ba467e8f90a.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000009faf-00052b8afeaa9672.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000006b98-00052b7171ab2d26.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000003705-00052b5868f72dc3.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000000001-00052b4314b1902b.journal
The lines appearing in red show quite obviously a file corruption.
After I ran the verify command, logging started working again:
[root@centos log]# journalctl -f
-- Logs begin at Mon 2016-02-08 15:33:13 CET. --
Mar 17 09:01:05 centos.local sshd[1643]: Set /proc/self/oom_score_adj to 0
Mar 17 09:01:05 centos.local sshd[1643]: Connection from 10.162.215.204 port 51034 on 10.162.215.53 port 22
Mar 17 09:01:05 centos.local sshd[1643]: Connection closed by 10.162.215.204 [preauth]
Mar 17 09:02:01 centos.local systemd[1]: Started Session 61871 of user root.
Mar 17 09:02:01 centos.local systemd[1]: Starting Session 61871 of user root.
Mar 17 09:02:01 centos.local CROND[1724]: (root) CMD (/app/scripts/idssrvjobs/idsjobindex)
Mar 17 09:02:05 centos.local sshd[1743]: Set /proc/self/oom_score_adj to 0
Mar 17 09:02:05 centos.local sshd[1743]: Connection from 10.162.215.204 port 51202 on 10.162.215.53 port 22
Mar 17 09:02:05 centos.local sshd[1743]: Connection closed by 10.162.215.204 [preauth]
Another run of --verify still showed the same corrupted journal file, so I deleted it and then ran verify again:
[root@centos log]# rm "/run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000001442e-00052bff5257e30c.journal"
rm: remove regular file ‘/run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000001442e-00052bff5257e30c.journal’? y
[root@centos log]# journalctl --verify
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000001242d-00052bdfb8cb8d83.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000010460-00052bbf48b22ed6.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-000000000000d417-00052ba467e8f90a.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000009faf-00052b8afeaa9672.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000006b98-00052b7171ab2d26.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000003705-00052b5868f72dc3.journal
PASS: /run/log/journal/5050c3bd03844f988c3383ff7e5a4b52/system@11d03d9ffaa240bbb2cdbdee87d322bb-0000000000000001-00052b4314b1902b.journal
This time all checks passed.
For over 10 years I never had to worry that Linux stops logging. And even if that happened, the syslog process could be monitored.
Here systemd told me that systemd-journald is running correctly although it clearly had a problem. As this is not my first systemd problem in a production environment, I avoid or ditch systemd wherever possible. On Debian and Ubuntu I replaced systemd by the stable SysV init system. Unfortunately this system is a CentOS and systemd is very much bundled fix into the system. I just hope this was a one time problem, but I doubt that.
Update September 15th 2016:
More and more distributions have switched to systemd. I just experienced the same problem on a openSUSE 13.2 machine:
# journalctl --verify
Invalid tail monotonic timestamp██████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 49%
File corruption detected at /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-1002@e2b1911792f44ec2a5af6eef8cc2fdf1-00000000000cecf4-00053876db13a8e9.journal:000000 (of 8388608 bytes, 0%).
FAIL: /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-1002@e2b1911792f44ec2a5af6eef8cc2fdf1-00000000000cecf4-00053876db13a8e9.journal (Bad message)
Invalid tail monotonic timestamp██████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 49%
File corruption detected at /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-900@46649045ac2d45bc8543ab113c770566-0000000000064f0c-00052e4c2c98616c.journal:000000 (of 8388608 bytes, 0%).
FAIL: /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-900@46649045ac2d45bc8543ab113c770566-0000000000064f0c-00052e4c2c98616c.journal (Bad message)
Invalid tail monotonic timestamp██████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 49%
File corruption detected at /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-1000@b78fe5fd971048d6a4abf38182ad32ec-00000000000052c7-000522238f831f95.journal:000000 (of 8388608 bytes, 0%).
FAIL: /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-1000@b78fe5fd971048d6a4abf38182ad32ec-00000000000052c7-000522238f831f95.journal (Bad message)
Invalid tail monotonic timestamp██████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 49%
File corruption detected at /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-902@d37e827bc73b42bba5e732f37a4c6bdd-00000000000c5b03-000537962000a79a.journal:000000 (of 8388608 bytes, 0%).
FAIL: /var/log/journal/74fc18841ff54a7f8e09f6b58d4bad53/user-902@d37e827bc73b42bba5e732f37a4c6bdd-00000000000c5b03-000537962000a79a.journal (Bad message)
So this is a general systemd-journald problem, not just one on CentOS 7.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder