After an Ansible playbook was run on a load balancer system, running with keepalived and additional virtual ip addresses (VIPs), all of a sudden all VIPs were gone and unreachable. How is that possible? What exactly did the playbook do? Not much, it turned out. The Ansible playbook did what it was supposed to and made some base configurations (applied on all servers) and at the end ran a system update using (safe-) upgrade. And right after this playbook task, the VIPs were gone. Down.
Interestingly there was no failover to the secondary loadbalancer - which would indicate that keepalived's vrrp communication still worked.
By checking dmesg on the affected load balancer, something interesting showed up in the events: systemd!
[Tue May 12 13:51:06 2020] systemd[1]: systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
[Tue May 12 13:51:06 2020] systemd[1]: Detected virtualization vmware.
[Tue May 12 13:51:06 2020] systemd[1]: Detected architecture x86-64.
[Tue May 12 13:51:06 2020] systemd[1]: Stopping Journal Service...
[Tue May 12 13:51:06 2020] systemd-journald[10688]: Received SIGTERM from PID 1 (systemd).
[Tue May 12 13:51:06 2020] systemd[1]: Stopped Journal Service.
[Tue May 12 13:51:06 2020] systemd[1]: Starting Journal Service...
[Tue May 12 13:51:06 2020] systemd[1]: Started Journal Service.
Was the systemd package updated? A verfication in apt's history logs confirmed:
root@loadbalancer1:~# cat /var/log/apt/history.log
Start-Date: 2020-05-12 13:50:58
Requested-By: ansible (1001)
Install: linux-modules-extra-4.15.0-99-generic:amd64 (4.15.0-99.100, automatic), linux-headers-4.15.0-99:amd64 (4.15.0-99.100, automatic), linux-modules-4.15.0-99-generic:amd64 (4.15.0-99.100, automatic), linux-headers-4.15.0-99-generic:amd64 (4.15.0-99.100, automatic), linux-image-4.15.0-99-generic:amd64 (4.15.0-99.100, automatic)
Upgrade: linux-headers-generic:amd64 (4.15.0.96.87, 4.15.0.99.89), python-samba:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), libldap-2.4-2:amd64 (2.4.45+dfsg-1ubuntu1.4, 2.4.45+dfsg-1ubuntu1.5), libwbclient0:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), libsystemd0:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), linux-image-generic:amd64 (4.15.0.96.87, 4.15.0.99.89), udev:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), libudev1:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), samba-libs:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), samba-common:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), systemd-sysv:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), libldap-common:amd64 (2.4.45+dfsg-1ubuntu1.4, 2.4.45+dfsg-1ubuntu1.5), libpam-systemd:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), systemd:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), libsmbclient:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), smbclient:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), samba-common-bin:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), sosreport:amd64 (3.9-1ubuntu0.18.04.2, 3.9-1ubuntu0.18.04.3), libmysqlclient20:amd64 (5.7.29-0ubuntu0.18.04.1, 5.7.30-0ubuntu0.18.04.1), libnss-systemd:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), linux-firmware:amd64 (1.173.17, 1.173.18), linux-generic:amd64 (4.15.0.96.87, 4.15.0.99.89)
Remove: linux-modules-extra-4.15.0-74-generic:amd64 (4.15.0-74.84)
End-Date: 2020-05-12 13:52:51
From the gathered information it looked as if systemd (or its relevant udev part) would have wiped off the VIPs from the system.
As there are a couple of such setups around, it did not take long to find a similar loadbalancer in the same state and in a test environment to reproduce this issue. Maybe a manual package update would reveal more information, too?
Before the packages were updated, the current versions were gathered:
root@anotherlb:~# apt-show-versions -u | grep systemd
libnss-systemd:amd64/bionic-updates 237-3ubuntu10.33 upgradeable to 237-3ubuntu10.40
libpam-systemd:amd64/bionic-updates 237-3ubuntu10.33 upgradeable to 237-3ubuntu10.40
libsystemd0:amd64/bionic-updates 237-3ubuntu10.33 upgradeable to 237-3ubuntu10.40
systemd:amd64/bionic-updates 237-3ubuntu10.33 upgradeable to 237-3ubuntu10.40
systemd-sysv:amd64/bionic-updates 237-3ubuntu10.33 upgradeable to 237-3ubuntu10.40
And the VIPs were listed and showing up just fine in ip a output:
root@anotherlb:~# ip a
1: lo:
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens192:
link/ether 00:50:56:8d:fb:45 brd ff:ff:ff:ff:ff:ff
inet 192.168.22.141/25 brd 194.40.216.255 scope global ens192
valid_lft forever preferred_lft forever
inet 192.168.22.140/32 scope global ens192 <<<< this is the VIP
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe8d:fb45/64 scope link
valid_lft forever preferred_lft forever
Let's do the package update, by only selecting the systemd packages:
root@anotherlb:~# apt-get install systemd systemd-sysv
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
linux-headers-4.15.0-62 linux-headers-4.15.0-62-generic linux-image-4.15.0-62-generic linux-modules-4.15.0-62-generic
linux-modules-extra-4.15.0-62-generic
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
libnss-systemd libpam-systemd libsystemd0
Suggested packages:
systemd-container
The following packages will be upgraded:
libnss-systemd libpam-systemd libsystemd0 systemd systemd-sysv
5 upgraded, 0 newly installed, 0 to remove and 83 not upgraded.
Need to get 3,346 kB of archives.
After this operation, 57.3 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://ch.archive.ubuntu.com/ubuntu bionic-updates/main amd64 libnss-systemd amd64 237-3ubuntu10.40 [104 kB]
Get:2 http://ch.archive.ubuntu.com/ubuntu bionic-updates/main amd64 systemd-sysv amd64 237-3ubuntu10.40 [14.4 kB]
Get:3 http://ch.archive.ubuntu.com/ubuntu bionic-updates/main amd64 libpam-systemd amd64 237-3ubuntu10.40 [107 kB]
Get:4 http://ch.archive.ubuntu.com/ubuntu bionic-updates/main amd64 systemd amd64 237-3ubuntu10.40 [2,913 kB]
Get:5 http://ch.archive.ubuntu.com/ubuntu bionic-updates/main amd64 libsystemd0 amd64 237-3ubuntu10.40 [207 kB]
Fetched 3,346 kB in 9s (392 kB/s)
(Reading database ... 141281 files and directories currently installed.)
Preparing to unpack .../libnss-systemd_237-3ubuntu10.40_amd64.deb ...
Unpacking libnss-systemd:amd64 (237-3ubuntu10.40) over (237-3ubuntu10.33) ...
Preparing to unpack .../systemd-sysv_237-3ubuntu10.40_amd64.deb ...
Unpacking systemd-sysv (237-3ubuntu10.40) over (237-3ubuntu10.33) ...
Preparing to unpack .../libpam-systemd_237-3ubuntu10.40_amd64.deb ...
Unpacking libpam-systemd:amd64 (237-3ubuntu10.40) over (237-3ubuntu10.33) ...
Preparing to unpack .../systemd_237-3ubuntu10.40_amd64.deb ...
Unpacking systemd (237-3ubuntu10.40) over (237-3ubuntu10.33) ...
Preparing to unpack .../libsystemd0_237-3ubuntu10.40_amd64.deb ...
Unpacking libsystemd0:amd64 (237-3ubuntu10.40) over (237-3ubuntu10.33) ...
Setting up libsystemd0:amd64 (237-3ubuntu10.40) ...
Setting up systemd (237-3ubuntu10.40) ...
Failed to try-restart systemd-resolved.service: Unit systemd-resolved.service is masked.
Setting up libnss-systemd:amd64 (237-3ubuntu10.40) ...
Setting up systemd-sysv (237-3ubuntu10.40) ...
Setting up libpam-systemd:amd64 (237-3ubuntu10.40) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for dbus (1.12.2-1ubuntu1.1) ...
Processing triggers for ureadahead (0.100.0-21) ...
Right after the step of "setting up systemd", the systemd-resolved service was attempted to be restarted. This service, however, is on this particular machine masked (on purpose!).
root@anotherlb:~# systemctl list-unit-files | grep resolved
systemd-resolved.service masked
The big question is now: Is the masked systemd-resolved unit to blame or would this happen anyway after a systemd and udev restart?
The current systemd package for Ubuntu Bionic (18.04) can be downloaded and the extracted systemd_237-3ubuntu10.40.debian.tar.xz showed an interesting part in the systemd.postinst file on line 42:
42 # Enable resolved by default on new installs installs and upgrades
43 if dpkg --compare-versions "$2" lt "234-1ubuntu2~"; then
44 systemctl enable systemd-resolved.service || true
45 fi
and later on:
149 # skip daemon-reexec and try-restarts during shutdown to avoid hitting LP: #1803391
150 if [ -n "$2" ] && [ "$(systemctl is-system-running)" != "stopping" ]; then
151 _systemctl daemon-reexec || true
152 # don't restart logind; this can be done again once this gets implemented:
153 # https://github.com/systemd/systemd/issues/1163
154 _systemctl try-restart systemd-networkd.service || true
155 _systemctl try-restart systemd-resolved.service || true
156 _systemctl try-restart systemd-timesyncd.service || true
157 _systemctl try-restart systemd-journald.service || true
158 fi
So basically systemd tries to enable and then later restart (try-restart) the systemd-resolved service. If it fails it simply returns true. This can be manually executed to see the behaviour:
root@anotherlb:~# systemctl try-restart systemd-resolved.service || true
Failed to try-restart systemd-resolved.service: Unit systemd-resolved.service is masked.
root@anotherlb:~# echo $?
0
Even though systemd-resolved could not be restarted (because the service is masked), the exit code is 0. Which means for the package installation/upgrade: All good, continue.
So far so good. So it must be something else. What about the service just before, systemd-networkd? This is an enabled service, so this should work out of the box, right?
root@anotherlb:~# systemctl try-restart systemd-networkd.service || true
As soon as this command was fired, the VIPs were gone again (continuously being pinged by another terminal session)! ip a confirmed the VIP is gone:
root@anotherlb:~# ip a
1: lo:
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens192:
link/ether 00:50:56:8d:fb:45 brd ff:ff:ff:ff:ff:ff
inet 192.168.22.141/25 brd 194.40.216.255 scope global ens192
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe8d:fb45/64 scope link
valid_lft forever preferred_lft forever
Now with that information at hand I wanted to report an Ubuntu bug. But it turns out: The bug already exists: LP Bug #1815101! The bug was confirmed in February 2020 but so far no fix is available.
Workaround: Restart keepalived right after a systemd update and the VIPs are back again.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder