check_rancher2 1.10.0 released: Handle newer Kubernetes versions, added ignore status in workload check

Written by - 0 comments

Published on - Listed in Docker Kubernetes Rancher Internet Cloud Monitoring


A new version of check_rancher2, an open source monitoring plugin for Kubernetes clusters managed by SUSE Rancher, is available! Version 1.10.0 handles an important bug when using a newer Kubernetes version in a cluster and adds other enhancements.

Fix: Handling newer Kubernetes versions

In previous Kubernetes versions, the "ComponentStatus" was used to identify issues within a Kubernetes cluster. This was also heavily used by Rancher and even integrated into the (old) user interface in the cluster overview:

The ComponentStatus not only showed up in the user interface but also in the Rancher API. The check_rancher2 monitoring plugin used the API output and obtained the current statutes from the ComponentStatus block.

However the ComponentStatus health checks were never 100% reliable in the first place (see related article Rancher 2 Kubernetes cluster shows alerts on Controller Manager and Scheduler for a technical deep dive). From that mentioned article:

ComponentStatus health checks of controller-manager and scheduler [...] are not working as they should

With the release of Kubernetes 1.19 the ComponentStatus was marked as deprecated.

Note: A working replacement for ComponentStatus was never announced.

According to the opened issue #35 it seems that the Rancher API removed the ComponentStatus output since Kubernetes version 1.22 or 1.23. Reporter jajla mentioned:

In two steps, I upgraded v1.21 cluster to v1.22, then continued with v1.23.
It's still good on v1.21, but not anymore on v1.23.

To properly handle this, check_rancher2 now tries to parse the ComponentStatus array from the API's JSON output, if the array exists. Otherwise it will silently skip that part.

Enhancement: Show Kubernetes version in single cluster check

While working on the issue with the newer Kubernetes versions mentioned above it crossed my mind that an information of the currently used Kubernetes version in a cluster would be really helpful.

Up until release 1.10.0, the output of a single cluster check (using -t cluster -c clusterid) simply showed whether or not the checked cluster is healthy:

$ ./check_rancher2.sh -H rancher2.example.com -U token-xxxxx -P "secret" -S -t cluster -c c-xxxxx
CHECK_RANCHER2 OK - Cluster test is healthy|'cluster_healthy'=1;;;; 'component_errors'=0;;;; 'cpu'=29280;;;;52000 'memory'=29114536832B;;;0;102540578816 'pods'=185;;;;880 'usage_cpu'=56%;;;0;100 'usage_memory'=28%;;;0;100 'usage_pods'=21%;;;0;100

Now with the version 10.0.0, the output also shows the (Rancher installed) Kubernetes version:

$ ./check_rancher2.sh -H rancher2.example.com -U token-xxxxx -P "secret" -S -t cluster -c c-xxxxx
CHECK_RANCHER2 OK - Cluster test (v1.23.10+rke2r1) is healthy|'cluster_healthy'=1;;;; 'component_errors'=0;;;; 'cpu'=29280;;;;52000 'memory'=29114536832B;;;0;102540578816 'pods'=185;;;;880 'usage_cpu'=56%;;;0;100 'usage_memory'=28%;;;0;100 'usage_pods'=21%;;;0;100

Enhancement: Ignore status on workload checks

In issue #29 a new feature was requested: Ignore statuses in workloads. Ignoring certain statuses was already possible in earlier releases, however only on node checks. The same logic from the node checks was now applied on the workload checks. Certain workload stati can now be ignored.

The following example shows a workload in updating status:

$ ./check_rancher2.sh -H rancher2.example.com -U token-xxxxx -P "secret" -S -t workload -p c-xxxxx:p-xxxxx
CHECK_RANCHER2 WARNING - 1 workload(s) in warning state: Workload efk-filebeat is updating - |'workloads_total'=3;;;; 'workloads_errors'=0;;;; 'workloads_warnings'=1;;;; 'workloads_paused'=0;;;; 'workloads_ignored'=0;;;;

Note: Remember, a workload check requires the project id (-p projectid)

If you want to ignore all workloads with this "updating" status, you can now append -i updating to the command:

$ ./check_rancher2.sh -H rancher2.example.com -U token-xxxxx -P "secret" -S -t workload -p c-xxxxx:p-xxxxx -i updating
CHECK_RANCHER2 OK - All workloads (3) in project c-fctn6:p-2xg6v are healthy/active - Workload efk-filebeat is updating but ignored -|'workloads_total'=3;;;; 'workloads_errors'=0;;;; 'workloads_warnings'=0;;;; 'workloads_paused'=0;;;; 'workloads_ignored'=1;;;;

In this case the monitoring plugin returns status OK, but shows the ignored workload(s) in the output. 

The same works of course also for a single workload. Without ignoring the updating status the plugin returns a warning:

$ ./check_rancher2.sh -H rancher.example.com -U token-xxxxx -P "secret" -S -t workload -p c-xxxxx:p-xxxxx -w efk-filebeat
CHECK_RANCHER2 WARNING - Workload efk-filebeat is updating|'workload_active'=0;;;; 'workload_error'=0;;;; 'workload_warning'=1;;;; 'workload_ignored'=0;;;;

Again, append -i updating and the plugin now returns OK (but hints that the status was ignored):

$ ./check_rancher2.sh -H rancher.example.com -U token-xxxxx -P "secret" -S -t workload -p c-xxxxx:p-xxxxx -w efk-filebeat -i updating
CHECK_RANCHER2 OK - Workload efk-filebeat is updating but ignored|'workload_active'=0;;;; 'workload_error'=0;;;; 'workload_warning'=1;;;; 'workload_ignored'=1;;;;

Important note: If you use the ignore parameter (-i) on the workload check, you need to be sure what you're doing. A workload which remains in updating state over a long time is an important indicator that something's not right with that deployment.



Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder