A new version of check_rancher2, a monitoring plugin for Kubernetes clusters managed by SUSE Rancher, is now available! Version 1.12.0 introduces a new check type "local-certs" to monitor the (internal) certificates used and deployed by Rancher in the "local" cluster.
When installed, Rancher deploys certificates into the "local" (Rancher management) cluster under the "System" project. More precisely, these certificates are stored as Kubernetes secrets in the "cattle-system" namespace and can be seen using kubectl:
$ kubectl -n cattle-system get secret | grep tls
cattle-webhook-ca kubernetes.io/tls 2 464d
cattle-webhook-tls kubernetes.io/tls 2 3h49m
serving-cert kubernetes.io/tls 2 3h41m
tls-rancher kubernetes.io/tls 2 4y88d
tls-rancher-internal kubernetes.io/tls 2 161m
tls-rancher-internal-ca kubernetes.io/tls 2 464d
After installation of Rancher, these certificates are created with a one year validity (except the ca certificates have an expiry date farther in the future). These certificates are usually only renewed when Rancher is updated. If the certificate(s) expire, your Rancher cluster will likely run into a problem. Because these certificates are only used internally by Rancher (compared to the Kubernetes certificates), you won't recognize a problem immediately. Only by doing some specific management tasks (such as changing RBAC/Users) you will notice problems.
On the Rancher 2 classic UI (Cluster Manager) you can see these certificates in the "local" cluster, under the "System" project. Under "Resources" select "Secrets", then change to the tab "Certificates".
In the newer Cluster Explorer UI select "Secrets" in the left navigation, then in the namespace selector (at the top) select "cattle-system" and sort the list by the "Kind" tab. The certificates should show as "TLS Certificate".
Newer Rancher releases have added a fix to automatically renew these internal certificates when a certificate is within 30 days (or fewer) of expiration date.
The Rancher 2.5 documentation says:
In Rancher v2.5.12 and up, rancher-webhook deployments will automatically renew their TLS certificate when it is within 30 or fewer days of its expiration date
Similar for Rancher 2.6:
In Rancher v2.6.3 and up, rancher-webhook deployments will automatically renew their TLS certificate when it is within 30 or fewer days of its expiration date.
In the documentation only the rancher-webhook certificate is mentioned. The other certificates (serving-cert, tls-rancher and tls-rancher-internal) are unfortunately not documented - yet they will expire and may cause problems, too.
As mentioned, check_rancher2 version 1.12.0 now allows to monitor these internal certificates by using the "local-certs" check type:
$ ./check_rancher2.sh -H rancher2.example.com -U token-xxxxx -P "secret" -S -t local-certs
CHECK_RANCHER2 CRITICAL - 3 certificate(s) expired (cattle-webhook-tls expired 98 days ago - serving-cert expired 82 days ago - tls-rancher-internal expired 98 days ago -)|'total_certs'=6;;;; 'expired_certs'=3;;;; 'warning_certs'=0;;;; 'ignored_certs'=0;;;;
By default the plugin only checks for already expired certificates. To be alarmed before certificates expire, add the --cert-warn parameter with the number of days in advance (here 14 days):
$ ./check_rancher2.sh -H rancher2.example.com -U token-xxxxx -P "secret" -S -t local-certs --cert-warn 14
CHECK_RANCHER2
CRITICAL - 3 certificate(s) expired (cattle-webhook-tls expired 98 days
ago - serving-cert expired 82 days ago - tls-rancher-internal expired
98 days ago -)|'total_certs'=6;;;; 'expired_certs'=3;;;; 'warning_certs'=0;;;; 'ignored_certs'=0;;;;
The plugin also allows to ignore one or more certificates from the check:
$ ./check_rancher2.sh -H rancher2.example.com -U token-xxxxx -P "secret" -S -t local-certs --cert-warn 14 -i tls-rancher-internal
CHECK_RANCHER2
CRITICAL - 2 certificate(s) expired (cattle-webhook-tls expired 98 days
ago - serving-cert expired 82 days ago -) - 1 certificate(s) ignored: tls-rancher-internal|'total_certs'=6;;;; 'expired_certs'=2;;;; 'warning_certs'=0;;;; 'ignored_certs'=1;;;;
This new monitoring check should help to fix a number of Rancher Kubernetes clusters before they run into problems. Expired Kubernetes or Rancher internal certificates is one of the most widely reported issues.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder