As I was running some tests on a newly created LXC container with Debian 12 (Bookworm) running, I was stunned to see all the physical CPU cores appearing in the htop output:
This container is supposed to have 2 cpu's, which are set in the container's config file using cgroup limits. Yet the output clearly shows much more CPUs...
Did I forget to set the cgroup limits on the LXC host (running Debian 11)? I verified and nope, the cgroupv2 cpu limits are set:
root@lxchost ~ # cat /var/lib/lxc/bookworm/config |grep cgroup
lxc.cgroup2.cpuset.cpus = 12-13
lxc.cgroup2.cpu.weight = 100
lxc.cgroup2.memory.max = 10G
lxc.cgroup2.memory.high = 10G
I also double-checked that lxcfs is installed on the host, which is a requirement for the containers to correctly interpret the limits; and yes, it's installed as well.
Compared to other LXC containers running on the same host, only the new Debian 12 container showed this issue.
My first thought was: Hmm... maybe in Debian 12 something changed with interpreting the cgroup limits set by and on the LXC host? Was the (older) LXC version 4.x on the host (running an older Debian 11) the problem?
To verify this, I opened a topic in the LXC discussion forums, but after a hint from Stéphane Graber I quickly realized something: Although the htop output above shows 24 CPUs, only the first two of them are actually used. The others remained at 0% usage. So the cgroup limits actually seem to work - but is not shown for the cpus (memory limits are correctly shown by the way).
With the current findings, the problem seems to be htop itself. Somehow the CPU information is read at the wrong place (?) or cgroups limits for CPUs are somewhat ignored.
Debian Bookworm comes with htop 3.2.2:
root@bookworm:~# dpkg -l|grep htop
ii htop 3.2.2-2 amd64 interactive processes viewer
On Bullseye it was an older version 3.0.5:
root@bullseye:~# dpkg -l|grep htop
ii htop 3.0.5-7 amd64 interactive processes viewer
In the changelog of the latest htop release (3.2.2) there is a line hinting to a behaviour change for containers and cgroup limits:
On Linux, improvements to cgroup and container identification
Well, maybe this caused a regression?
Let's find out by using an older version of htop!
I turned to my lab environment and decided to compile one htop version after another, until the problematic version is found. Luckily htop is a pretty small software and doesn't require hours of compiling. A different release can therefore quickly be downloaded an compiled.
To get all the necessary compiling tools, a few packages must be installed first:
root@bookworm:~# apt install libncursesw5-dev autotools-dev autoconf automake build-essential
After this we can download, unpack and compile the older version - 3.2.1 in this case:
root@bookworm:~# wget https://github.com/htop-dev/htop/releases/download/3.2.1/htop-3.2.1.tar.xz
root@bookworm:~# tar -xf htop-3.2.1.tar.xz
root@bookworm:~# cd htop-3.2.1
root@bookworm:~/htop-3.2.1# ./autogen.sh && ./configure && make
This results in a htop binary in the same directory:
root@bookworm:~/htop-3.2.1# ls -ltr| tail
-rw-r--r-- 1 root root 30408 Nov 20 20:48 TasksMeter.o
-rw-r--r-- 1 root root 44096 Nov 20 20:48 TraceScreen.o
-rw-r--r-- 1 root root 28408 Nov 20 20:48 UptimeMeter.o
-rw-r--r-- 1 root root 8544 Nov 20 20:48 UsersTable.o
-rw-r--r-- 1 root root 29288 Nov 20 20:48 Vector.o
-rw-r--r-- 1 root root 40416 Nov 20 20:48 XUtils.o
drwxr-xr-x 3 fhadm 121 4096 Nov 20 20:48 generic
drwxr-xr-x 3 fhadm 121 4096 Nov 20 20:48 linux
drwxr-xr-x 3 fhadm 121 4096 Nov 20 20:48 zfs
-rwxr-xr-x 1 root root 1301968 Nov 20 20:48 htop
And this can be executed and compared to the other htop binary, installed through the Debian repos:
root@bookworm:~/htop-3.2.1# ./htop
The ncurses output speaks for itself:
Two CPUs are showing with htop 3.2.1 - the correct amount which was set by the cgroup limit. The problem must indeed be some change in htop 3.2.2.
This looks pretty much like a regression to me and I opened up issue #1332. Hopefully this is confirmed and fixed soon, but even then, it might take quite some time until the upstream fix makes it into the Debian repositories.
To get rid of this "htop bug" in 3.2.2, I decided to build my own DEB package htop 3.2.1 for Debian Bookworm. While I was DH-preparing the htop-3.2.1 directory, I compared the debian files from the original 3.2.2 package for Bookworm. And I stumbed across something interesting. The Debian (modification) package contains a single patch, which creates a code difference compared to the upstream/source code:
Looking closer at that patch shows that LXC specific code was removed from htop's source code:
No way! Could it actually be, that the problem is caused by this patch inside the deb package? Is htop 3.2.2, compiled from source, actually working? Let's find out!
After compiling htop 3.2.2 from upstream/source, the same way as shown above with 3.2.1, take a look at this screenshot:
And to my surprise, the cgroup limits are correctly handled. Only two CPUs are showing up in htop, as it's supposed to be.
The problem is therefore caused by the deb package on Debian 12, not by the htop's upstream (source) code!
A Debian bug report (#1057466) was opened to tackle and hopefully fix this bug or regression, whatever it turns out to be.
CK from Switzerland wrote on Apr 26th, 2024:
Hi Gustavo. I still gave the original htop package on my Bookworm containers. It was too much of effort to keep a separate package for this purpose. Recently a new LXC (and lxcfs) version was released and the changelog shows a different handling in presenting the virtual cpus to the Kernel. Maybe this will positively affect how htop looks like on a Bookworm container. I have not had the time to test this, yet.
Gustavo B. Schenkel from Porto Alegre, RS / Brazil wrote on Apr 26th, 2024:
Hi, I just had enough with htop on debian lxc which I run on my Proxmox servers and tried check on web about it, and I saw your issue on htop repository and on debian bug track.
I still see this problema today on debian lxc, but not on alpinelinux, the patch from debian devs are still in there until today, than I ask, are you using your self-build htop package until today?
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder