While implementing check_cpu_stats on a customer site, I realized the plugin didn't work in certain Linux distributions. Because the deployment method uses GitLab pipelines to run tests in the different distributions, the plugin failed to execute on CentOS7 and RockyLinux 9 containers (but strangely worked in RockyLinux 8).
A closer look at the pipeline jobs showed the following error message:
$ /usr/lib64/nagios/plugins/check_cpu_stats.sh -w 80,50,10 -c 90,60,20
UNKNOWN: iostat does not exist, please check if command exists and PATH is correct
The error message is misleading as the iostat command (from the sysstat package) very well exists. In the background the plugin verifies that all required commands exist on that system. It does so by using the which command:
for cmd in iostat; do
if ! `which ${cmd} >/dev/null 2>&1`; then
echo "UNKNOWN: ${cmd} does not exist, please check if command exists and PATH is correct"
exit ${STATE_UNKNOWN}
fi
The code expects that the which command returns an exit code 0 if the command (iostat) is found. which returns an exit code 1 if the command is not found. To reproduce this:
ck@debian ~ $ which iostat; echo $?
/usr/bin/iostat
0
ck@mintp ~ $ which afakecommand; echo $?
1
But after looking closer at the centos:7 and rockylinux:9 containers (images), it turns out that the which command is not installed by default (in a minimum installation):
root@rocky9 ~ # which iostat
bash: /usr/bin/which: No such file or directory
It's actually the which command which doesn't exist on the system (what a sentence!). The plugin is therefore fooled and can't execute the verification on required commands.
In Enterprise Linux distributions (CentOS, RHEL, RockyLinux, AlmaLinux, Oracle, and others) the which command comes from a separate package "which":
root@rocky9 ~ # dnf search which | egrep "^which\."
Last metadata expiration check: 1:26:44 ago on Thu 07 Dec 2023 12:31:00 PM UTC.
which.x86_64 : Displays where a particular program in your path is located
root@rocky9 ~ # dnf install which
As I had to find out the hard way, the which package is not installed by default in EL distributions.
On Debian-based distributions, the which command is always installed through the debianutils package. That might not be obvious, but /usr/bin/which is actually a symlink:
ck@debian ~ $ file /usr/bin/which
/usr/bin/which: symbolic link to /etc/alternatives/which
ck@debian ~ $ file /etc/alternatives/which
/etc/alternatives/which: symbolic link to /usr/bin/which.debianutils
And if you follow all the symbolic links to the actual shell script (yes, that's all there is behind the which command), you come across the debianutils package:
ck@debian ~ $ dpkg -S /usr/bin/which.debianutils
debianutils: /usr/bin/which.debianutils
To sum that up: Debian and Debian-based distributions contain the which command by default, as it is part of the debianutils package. In Enterprise Linux distributions, the which command must be installed through a separate package (which). Depending on the setup and dependencies the which package might not be installed by default.
This means that which is not a suitable command for the task to verify one or more commands exist on the system.
By looking at minimal installations of Debian, Ubuntu, CentOS and Rocky Linux distributions, it seems the whereis command exists on all of them. But can we use whereis as a one to one alternative? The short answer is no. The long answer is still no, but with a surprising reason: The exit code is always 0.
It turns out that whereis shows the path of a command in the output and exits with code 0:
root@rocky9 ~ # whereis iostat; echo $?
iostat: /usr/bin/iostat /usr/share/man/man1/iostat.1.gz
0
But whereis does the same if no path was found for the command we are looking for:
root@rocky9 ~ # whereis afakecommand; echo $?
afakecommand:
0
Note the exit code 0 on an empty result? Weird, isn't it?
As whereis is part of the util-linux package, I created a feature request to ask for a different exit code when no results are showing up.
Because as of this static exit code, whereis requires additional output parsing to be used as an alternative to which. Therefore a no-go.
Another command, which turns out to exist on all Linux distributions by default, is the command command (again, what a sentence!). The reason for this is that command is part of the POSIX utilities, which is a standard followed by (serious) UNIX Operating Systems, including Linux.
The command man page tells us that we can use the -v parameter to find the path name of a command - basically the same function as which does:
Write a string to standard output that indicates the pathname or command that will be used by the shell, in the current shell execution environment (see Shell Execution Environment), to invoke command_name, but do not invoke command_name.
But what about the exit codes? Let' try and find out:
root@rocky9 ~ # command -v iostat; echo $?
/usr/bin/iostat
0
root@rocky9 ~ # command -v afakecommand; echo $?
1
Yeah, a different exit code! This means, command can be used as a 1:1 alternative to which!
Why stop now when there's even more commands to be used as a potential alternative. During my research on the POSIX utilities I also came across the type command. This command is able to determine whether the command we're looking for is a real command (with a path), if it's a built-in shell function, an alias, or something else.
But what about the exit codes here?
root@rocky9 ~ # type iostat; echo $?
iostat is /usr/bin/iostat
0
root@rocky9 ~ # type afakecommand; echo $?
bash: type: afakecommand: not found
1
As we can see, the output shows that nothing was found for the search term. The exit code represents whether there was a result on our command search. type can therefore also be used as a which replacement.
Finally I decided to use command -v as a replacement for which. check_cpu_stats.sh is the first monitoring plugin (written as a Shell script) that will undergo this change. My other monitoring plugins will follow that change to not depend on an additional package - at least on EL distributions.
In general the lesson learned here is to use commands from the POSIX utilities as much as possible.
To prevent running into similar distribution-specific issues, integration tests can be can be built around a Shell script. Using GitHub actions is such a possibility, in GitLab you can create CI/CD pipelines which test your Shell script in different distributions and versions. Meanwhile almost every repository provider provides such pipelines which can be used for integration tests.
Leo from wrote on Dec 7th, 2023:
Thanks for the thorough article! I did not know whereis or type could do that
Leo from wrote on Dec 7th, 2023:
The shellcheck linter does a good job of catching this potential issue
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder