check_rancher2 1.12.0 released: Monitoring of Rancher internal certificates in the local cluster

Written by - 0 comments

Published on - Listed in Kubernetes Rancher Internet Cloud Monitoring


A new version of check_rancher2, a monitoring plugin for Kubernetes clusters managed by SUSE Rancher, is now available! Version 1.12.0 introduces a new check type "local-certs" to monitor the (internal) certificates used and deployed by Rancher in the "local" cluster.

Rancher internal certificates

When installed, Rancher deploys certificates into the "local" (Rancher management) cluster under the "System" project. More precisely, these certificates are stored as Kubernetes secrets in the "cattle-system" namespace and can be seen using kubectl:

$ kubectl -n cattle-system get secret  | grep tls
cattle-webhook-ca                       kubernetes.io/tls                     2      464d
cattle-webhook-tls                      kubernetes.io/tls                     2      3h49m
serving-cert                            kubernetes.io/tls                     2      3h41m
tls-rancher                             kubernetes.io/tls                     2      4y88d
tls-rancher-internal                    kubernetes.io/tls                     2      161m
tls-rancher-internal-ca                 kubernetes.io/tls                     2      464d

After installation of Rancher, these certificates are created with a one year validity (except the ca certificates have an expiry date farther in the future). These certificates are usually only renewed when Rancher is updated. If the certificate(s) expire, your Rancher cluster will likely run into a problem. Because these certificates are only used internally by Rancher (compared to the Kubernetes certificates), you won't recognize a problem immediately. Only by doing some specific management tasks (such as changing RBAC/Users) you will notice problems.

On the Rancher 2 classic UI (Cluster Manager) you can see these certificates in the "local" cluster, under the "System" project. Under "Resources" select "Secrets", then change to the tab "Certificates".

Expired Rancher internal certificates

In the newer Cluster Explorer UI select "Secrets" in the left navigation, then in the namespace selector (at the top) select "cattle-system" and sort the list by the "Kind" tab. The certificates should show as "TLS Certificate".

Newer Rancher releases have added a fix to automatically renew these internal certificates when a certificate is within 30 days (or fewer) of expiration date.

The Rancher 2.5 documentation says:

In Rancher v2.5.12 and up, rancher-webhook deployments will automatically renew their TLS certificate when it is within 30 or fewer days of its expiration date

Similar for Rancher 2.6:

In Rancher v2.6.3 and up, rancher-webhook deployments will automatically renew their TLS certificate when it is within 30 or fewer days of its expiration date.

In the documentation only the rancher-webhook certificate is mentioned. The other certificates (serving-cert, tls-rancher and tls-rancher-internal) are unfortunately not documented - yet they will expire and may cause problems, too.

Monitoring Rancher internal certificates

As mentioned, check_rancher2 version 1.12.0 now allows to monitor these internal certificates by using the "local-certs" check type:

$ ./check_rancher2.sh -H rancher2.example.com -U token-xxxxx -P "secret" -S -t local-certs
CHECK_RANCHER2 CRITICAL - 3 certificate(s) expired (cattle-webhook-tls expired 98 days ago - serving-cert expired 82 days ago - tls-rancher-internal expired 98 days ago -)|'total_certs'=6;;;; 'expired_certs'=3;;;; 'warning_certs'=0;;;; 'ignored_certs'=0;;;;

By default the plugin only checks for already expired certificates. To be alarmed before certificates expire, add the --cert-warn parameter with the number of days in advance (here 14 days):

$ ./check_rancher2.sh -H rancher2.example.com -U token-xxxxx -P "secret" -S -t local-certs --cert-warn 14
CHECK_RANCHER2 CRITICAL - 3 certificate(s) expired (cattle-webhook-tls expired 98 days ago - serving-cert expired 82 days ago - tls-rancher-internal expired 98 days ago -)|'total_certs'=6;;;; 'expired_certs'=3;;;; 'warning_certs'=0;;;; 'ignored_certs'=0;;;;

The plugin also allows to ignore one or more certificates from the check:

$ ./check_rancher2.sh -H rancher2.example.com -U token-xxxxx -P "secret" -S -t local-certs --cert-warn 14 -i tls-rancher-internal
CHECK_RANCHER2 CRITICAL - 2 certificate(s) expired (cattle-webhook-tls expired 98 days ago - serving-cert expired 82 days ago -) - 1 certificate(s) ignored: tls-rancher-internal|'total_certs'=6;;;; 'expired_certs'=2;;;; 'warning_certs'=0;;;; 'ignored_certs'=1;;;;

This new monitoring check should help to fix a number of Rancher Kubernetes clusters before they run into problems. Expired Kubernetes or Rancher internal certificates is one of the most widely reported issues.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PHP   Perl   Personal   PostgreSQL   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder    Linux