After a physical server was upgraded from Debian 10 (Buster) to 11 (Bullseye), one Virtual Machine started to show a significant decrease in performance. Monitoring stats showed pretty clearly that IOWAIT inside the guest OS was responsible for the slowness:
Yes, these stats are pretty bad. Especially bad, when the host itself is running with SSD drives.
So what has changed?
This particular VM was running Ubuntu 18.04 as guest OS and was upgraded to Ubuntu 20.04. At the same time the host's OS was upgraded from Debian 10 (Buster) to 11 (Bullseye).
The VM itself is a Kubernetes node and therefore part of a Kubernetes cluster. The guest OS upgrade also changes the container runtime:
The host's OS upgrade also changes the hypervisor (KVM/QEMU) and the VM management layer (libvirt):
Following the reboot of the host, we can see the VM running much slower, with a very high percentage of IOWAIT. These are the CPU stats seen by the guest OS itself:
You don't need to be a pro in analytics to see the major change since the VM was started again under the new Host OS.
But why would a newer Hypervisor version be slower than before?
How a VM is created using libvirt and QEMU can have a significant impact on a Virtual Machine's performance. I have already seen this in the past where I needed to analyze packet losses on a Ubuntu 18.04 KVM guest running on a Debian 9 physical host. Back then the culprit was the VM suffering from packet losses didn't use the virtio network driver. But in this new case the VM is already configured to use virtio, also and especially on the virtual disk (raw device).
Let's create a new Ubuntu 20.04 VM, now under Debian Bullseye, and see what the XML definition of that VM will look like.
root@bullseye:~# virt-install --connect qemu:///system --virt-type kvm -n kvmtest --import --memory 2048 --vcpus 2 --disk /dev/vgdata/kvmtest --network=bridge:virbr1 --graphics vnc,password=secret,port=5999 --noautoconsole --os-variant ubuntu20.04
Starting install...
Domain creation completed.
By using virsh dumpxml kvmtest the VM's virtual hardware configuration can be retrieved. And we can compare this output from the dumpxml of the VM having the IOWAIT issues. The relevant differences are:
root@bullseye:~# diff /tmp/kvmtest.xml /tmp/vm-with-iowait.xml
1c1
< <domain type='kvm' id='5'>
---
> <domain type='kvm' id='4'>
6c6
< <libosinfo:os id="http://ubuntu.com/ubuntu/20.04"/>
---
> <libosinfo:os id="http://ubuntu.com/ubuntu/18.04"/>
16c16
< <type arch='x86_64' machine='pc-q35-5.2'>hvm</type>
---
> <type arch='x86_64' machine='pc-q35-3.1'>hvm</type>
First up: libosinfo.
Yes, our vm-with-iowait is meanwhile running Ubuntu 20.04 but the OS version still mentions Ubuntu 18.04 in the VM's configuration. But is this information actually relevant for a VM performance? After all the <libosinfo> context is located inside the <metadata> context. Unfortunately this is not well (if not at all) documented. But a hint can be found in an old RedHat mailing list entry from Cole Robinson:
Right now in virt-manager we only track a VM's OS name (win10, fedora28, etc.) during the VM install phase. This piece of data is important post-install though: if the user adds a new disk to the VM later, we want to be able to ask libosinfo about what devices the installed OS supports, so we can set optimal defaults, like enabling virtio. [...] I want to add something similar to virt-manager but it seems a shame to invent our own private schema for something that most non-trivial virt apps will want to know about. I was thinking a schema we could document with libosinfo, something like
<metadata>
<libosinfo
xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
<os-id>http://fedoraproject.org/fedora/28</os-id>
</libosinfo>
</metadata>
Updating the OS version in the XML is therefore a wise choice, but, according to this information, has only an effect of adding devices. And as mentioned before, virtio drivers are already configured for the virtual devices.
And then there is the VM type.
Unfortunately this is even less documented than libosinfo. Either I was looking at the completely wrong parts of the Internet or the documentation got lost/deleted, the documentation is somewhere inside the source code or it just never was documented officially in the first place. Yes, sure, there are some hints that this defines the "machine type" - but what impact this has on a VM is not really described.
The best (and pretty much only) description yet was found on the Ubuntu server guide on Virtualization with QEMU:
If you are unsure what this is, you might consider this as buying (virtual) Hardware of the same spec but a newer release date. You are encouraged in general and might want to update your machine type of an existing defined guests in particular to:
to pick up latest security fixes and features
continue using a guest created on a now unsupported release
In general it is recommended to update machine types when upgrading qemu/kvm to a new major version. But this can likely never be an automated task as this change is guest visible. The guest devices might change in appearance, new features will be announced to the guest and so on. Linux is usually very good at tolerating such changes, but it depends so much on the setup and workload of the guest that this has to be evaluated by the owner/admin of the system. Other operating systems where known to often have severe impacts by changing the hardware. Consider a machine type change similar to replacing all devices and firmware of a physical machine to the latest revision - all considerations that apply there apply to evaluating a machine type upgrade as well.
Reading this, the <type> setting of a QEMU Virtual Machine is therefore something like the "firmware version" of a VM. Or in terms of VMware: Virtual Hardware version.
Looking at the diff above again shows the VM with our IOWAIT issue runs a type pc-q35-3.1. The newly created VM from scratch shows a type pc-q35-5.2 . Looking closer at this number reveals that 3.1 and 5.2 actually identify the installed QEMU version (see upgrade notes above). But could this really have such a strong impact on a VM's performance? Let's try.
Both information in libosinfo and type were changed on our problem VM using virsh edit. The values were adjusted to the ones seen in the difference of the new VM kvmtest.
Besides this, swap was also disabled on this VM (inside /etc/fstab) as the Ubuntu 20.04 guest is a node and part of a Kubernetes cluster.
But these changes are not applied immediately. The VM needs to be shut down and started again in order to read the updated XML definition.
Once the VM in question started up again, the stats were checked during the next few hours. And what a significant change this shows!
Once the Ubuntu 20.04 VM was booted again with the updated XML specs and swap disabled no IOWAIT could be seen anymore. And to make this even nicer: The performance turned out to be even better than before, when running as a Ubuntu 18.04 VM under a Debian Buster Host!
But which one of these changes helped fix the IOWAIT issue?
The question remains open whether the QEMU/libvirt XML changes or the swap change solved the performance issue. But I didn't need to look far, because another VM showed the exact same high IOWAIT stats. And this VM also has the exact same upgrade path behind it:
Time to find out what exactly changes, when libosinfo and the VM machine type are adjusted.
Before having made any actual changes, several information was collected:
After libosinfo and machine type were adjusted, again by using virsh edit, and this VM restarted, the output of the previous commands were saved and compared. Here are the changes inside the VM.
The output of journalct -k was saved. First after booting with the old libosinfo and machine type pc-q35-3.1 and again with the updated libosinfo and machine type pc-q35-5.2.
$ diff boot-3.1.txt boot-5.2.txt
28,29c28,29
< kvm-clock: cpu 0, msr 32d201001, primary cpu clock
< kvm-clock: using sched offset of 9680708323 cycles
---
> kvm-clock: cpu 0, msr 435a01001, primary cpu clock
> kvm-clock: using sched offset of 16819579002 cycles
186c186
< kvm-clock: cpu 1, msr 32d201041, secondary cpu clock
---
> kvm-clock: cpu 1, msr 435a01041, secondary cpu clock
191c191
< kvm-clock: cpu 2, msr 32d201081, secondary cpu clock
---
> kvm-clock: cpu 2, msr 435a01081, secondary cpu clock
196c196
< kvm-clock: cpu 3, msr 32d2010c1, secondary cpu clock
---
> kvm-clock: cpu 3, msr 435a010c1, secondary cpu clock
201c201
< kvm-clock: cpu 4, msr 32d201101, secondary cpu clock
---
> kvm-clock: cpu 4, msr 435a01101, secondary cpu clock
206c206
< kvm-clock: cpu 5, msr 32d201141, secondary cpu clock
---
> kvm-clock: cpu 5, msr 435a01141, secondary cpu clock
211c211
< kvm-clock: cpu 6, msr 32d201181, secondary cpu clock
---
> kvm-clock: cpu 6, msr 435a01181, secondary cpu clock
216c216
< kvm-clock: cpu 7, msr 32d2011c1, secondary cpu clock
---
> kvm-clock: cpu 7, msr 435a011c1, secondary cpu clock
643c643,644
< lpc_ich 0000:00:1f.0: I/O space for GPIO uninitialized
---
> input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input4
> input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input3
644a646
> lpc_ich 0000:00:1f.0: I/O space for GPIO uninitialized
645a648,649
> virtio_blk virtio2: [vda] 209715200 512-byte logical blocks (107 GB/100 GiB)
> vda: vda1
648,650d651
< virtio_blk virtio2: [vda] 209715200 512-byte logical blocks (107 GB/100 GiB)
< input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input4
< input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input3
656d656
< virtio_net virtio0 enp1s0: renamed from eth0
670c670
< usb 1-1: SerialNumber: 42
---
> usb 1-1: SerialNumber: 28754-0000:00:1d.7-1
The first thing which looks like a difference is a different msr value in the kvm-clock output. However the msr value is not a fixed value, as it contains dynamic elements, such as time. This explains the changed value and actually shows that in terms of CPU nothing has changed.
A bit further down we see a VirtualPS/2 (virtual mouse) connected, but here it's just a slightly different order of the boot output. All the entries are basically the same with one exception: The line "vda: vda1" only shows up in the newer boot output. But given that the previous line already indicates vda detected as a virtio2 drive, no major change here either.
This can also be verified by looking at the output of ls -la /sys/block/. Both vda devices show up under the same PCI device path (under 3.1 and 5.2 boot).
root@vm:~# ll /sys/block/
total 0
drwxr-xr-x 2 root root 0 Dec 9 17:07 ./
dr-xr-xr-x 13 root root 0 Dec 9 17:07 ../
lrwxrwxrwx 1 root root 0 Dec 9 17:07 loop0 -> ../devices/virtual/block/loop0/
lrwxrwxrwx 1 root root 0 Dec 9 17:07 loop1 -> ../devices/virtual/block/loop1/
lrwxrwxrwx 1 root root 0 Dec 9 17:07 loop2 -> ../devices/virtual/block/loop2/
lrwxrwxrwx 1 root root 0 Dec 9 17:07 loop3 -> ../devices/virtual/block/loop3/
lrwxrwxrwx 1 root root 0 Dec 9 17:07 loop4 -> ../devices/virtual/block/loop4/
lrwxrwxrwx 1 root root 0 Dec 9 17:07 loop5 -> ../devices/virtual/block/loop5/
lrwxrwxrwx 1 root root 0 Dec 9 17:07 loop6 -> ../devices/virtual/block/loop6/
lrwxrwxrwx 1 root root 0 Dec 9 17:07 loop7 -> ../devices/virtual/block/loop7/
lrwxrwxrwx 1 root root 0 Dec 9 17:07 sr0 -> ../devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sr0/
lrwxrwxrwx 1 root root 0 Dec 9 17:07 vda -> ../devices/pci0000:00/0000:00:02.2/0000:03:00.0/virtio2/block/vda/
Maybe some virtual hardware change can be spotted with dmidecode? Well there is indeed some change, but it only contains the updated machine type version:
$ diff dmidecode-3.1.txt dmidecode-5.2.txt
24c24
< Version: pc-q35-3.1
---
> Version: pc-q35-5.2
36c36
< Version: pc-q35-3.1
---
> Version: pc-q35-5.2
[...]
And what about the Kernel modules? Maybe the new machine type or libosinfo update triggered additional Kernel modules to be loaded?
$ diff lsmod-3.1.txt lsmod-5.2.txt
38c38
< ipt_REJECT 16384 42
---
> ipt_REJECT 16384 51
91c91
< xt_comment 16384 1426
---
> xt_comment 16384 1384
95c95
< xt_nat 16384 103
---
> xt_nat 16384 92
97,98c97,98
< xt_statistic 16384 32
< xt_tcpudp 20480 299
---
> xt_statistic 16384 30
> xt_tcpudp 20480 279
But another "disappointment". The diff shows the same modules loaded, just with a different "used by" value.
And what about the iostat output? Are the IOWAIT's gone or still here after a 10min runtime?
$ cat iostat-3.1.txt
root@vm:~# uptime && iostat -c 5
17:05:22 up 11 min, 1 user, load average: 6.33, 6.84, 4.72
Linux 5.4.0-126-generic (vm) 12/09/22 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
21.95 0.00 5.79 31.11 0.15 41.00
avg-cpu: %user %nice %system %iowait %steal %idle
5.42 0.00 3.54 24.77 0.08 66.20
avg-cpu: %user %nice %system %iowait %steal %idle
5.20 0.00 3.60 20.55 0.05 70.59
$ cat iostat-5.2.txt
root@vm:~# uptime && iostat -c 5
17:16:28 up 9 min, 1 user, load average: 5.10, 6.71, 4.21
Linux 5.4.0-126-generic (vm) 12/09/22 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
27.01 0.00 5.80 27.63 0.18 39.38
avg-cpu: %user %nice %system %iowait %steal %idle
4.84 0.00 2.97 55.14 0.05 37.00
avg-cpu: %user %nice %system %iowait %steal %idle
6.05 0.03 3.91 42.49 0.05 47.47
The IOWAIT values are still way too high.
This means: A change of libosinfo and the machine type did not change anything inside the VM and did not fix the IOWAIT issue!
In the recent past, I've already run into problems when trying to run Kubernetes on nodes with swap enabled. Funny enough this seems to have worked fine before though. Could the upgrade of docker.io and containerd be the actual cause for the IOWAIT issue?
Now with the VM running an updated libosinfo and VM machine type, swap was disabled (swapoff -a). This took a couple of minutes but eventually finished after ~5mins. And my mind was blown.
Roughly two minutes after completely disabling swap, the IOWAITs disappeared (besides one lonely spike).
Combining all these findings lead to an (unexpected) conclusion:
The fact that this IOWAIT issue did not occur before the upgrades (swap was already enabled (by mistake) before the upgrades) puts the blame on the apt-get dist-upgrade inside the guest OS. The newer docker.io and containerd packages don't seem to handle swap well (if at all).
But it could also be a combination of several facts. This VM which suffered from the IOWAIT issue is part of a cluster which does some heavy lifting, with a lot of deployments. Yet another VM, which was upgraded in the exact same way and hosted on the same physical host, is part of a much more silent Kubernetes cluster, not doing much load. Inside this second VM the IOWAIT issue cannot be reproduced.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder