check_rancher2 1.11.0 released: Allow ignoring specific workloads, treat provisioning cluster differently

Written by - 0 comments

Published on - Listed in Docker Kubernetes Rancher Internet Cloud Monitoring


A new version of check_rancher2, a monitoring plugin for Kubernetes cluster managed by SUSE Rancher, is now available! Version 1.11.0 has a new enhancement and also a fix.

Enhancement: Allow ignoring specific workloads

When doing a mass-check of all workloads within a project, you might come across some broken workloads. Of course, no workload should be broken, but there could be a known error, waiting for a fix, waiting for a developer to come back from vacation, ... who knows. In the following example I came across a specific workload always restarting itself:

$ ./check_rancher2.sh -H rancher.example.com -U token-xxxxx -P secret -S -t workload -p c-xxxxx:p-xxxxx
CHECK_RANCHER2 WARNING - 1 workload(s) in warning state: Workload efk-filebeat is updating - |'workloads_total'=3;;;; 'workloads_errors'=0;;;; 'workloads_warnings'=1;;;; 'workloads_paused'=0;;;; 'workloads_ignored'=0;;;;

Although the developers were informed about this alert and of the reason (container limits not sufficient), the workload was still not fixed. As this workload is not considered production critical, from a monitoring perspective this is just annoying. You might get used to this "always warning" state and might not see a production workload joining the efk-filebeat workload.

This is why the new version now allows to use the existing -i / --ignore parameter to not only ignore a certain workload status (such as updating), but ignore a specific workload:

$ ./check_rancher2.sh -H rancher.example.com -U token-xxxxx -P secret -S -t workload -p c-xxxxx:p-xxxxx  -i efk-filebeat
CHECK_RANCHER2 OK - All workloads (3) in project c-xxxxx:p-xxxxx are healthy/active - Workload efk-filebeat is ignored -|'workloads_total'=3;;;; 'workloads_errors'=0;;;; 'workloads_warnings'=0;;;; 'workloads_paused'=0;;;; 'workloads_ignored'=1;;;;

The check_rancher2 plugin now exists with OK but mentions the efk-filebeat workload specifically in the output (so you won't forget about it being ignored).

Multiple workloads can be ignored using a comma separation. For example:

-i "efk-filebeat,another-workload,yetanother-workload-name"

The documentation of check_rancher2 has been updated accordingly.

Fix: Do not treat a provisioning cluster as CRITICAL alert

As mentioned by Steffen Eichler, a cluster currently being provisioned should not cause a CRITICAL alert and it totally makes sense. The original idea (in PR #39) was to handle this state as OK. However I have already seen some Kubernetes clusters stuck in provisioning state; the status would then remain provisioning. You definitely want to know that something's stuck so the plugin will now alert with a WARNING instead of a CRITICAL. As the provisioning of a cluster shouldn't take much longer than a few minutes (seen up to 30mins in some scenarios), the alert should clear once the cluster finished and changed to active/healthy.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder