Expired certificates inside Kubernetes can be a major pain. Not only for the cluster itself (e.g. micro-services inside the cluster could stop communicating) but also for the Kubernetes administrator having to force a certificate renewal.
In Rancher managed Kubernetes (both in RKE and Single Docker installations) this problem has been known for a long time. In early 2.0 and 2.1 releases the initial Kubernetes certificates were created with a one year expiry date - leading to a broken cluster one year later. This was eventually fixed in later Rancher versions. To manually renew the certificates on HA clusters, RKE can be used on the local (management) cluster to rotate (renew) the certificates:
ck@config:~/rancher$ ./rke cert rotate --config 3-node-rancher.yml
INFO[0000] Running RKE version: v1.3.1
INFO[0000] Initiating Kubernetes cluster
INFO[0000] Rotating Kubernetes cluster certificates
[...]
But not all Kubernetes services automatically reload themselves with new certificates. Today I came across an expired TLS certificate on the Rancher Ingress services, running on the Rancher management (local cluster) nodes:
root@rancher01:~# /usr/lib/nagios/plugins/check_http -I 127.0.0.1 -p 443 -C 30,14
CRITICAL - Certificate 'Kubernetes Ingress Controller Fake Certificate' expired on Fri 11 Nov 2022 01:24:14 PM GMT +0000.
Luckily this didn't have an impact on the cluster nor on the end-users accessing the Rancher API or UI, as the Rancher nodes are never directly exposed to Internet and need to pass through a reverse proxy with a valid certificate (at least in my setups).
Even though the internal certificates in the background were (most likely) already automatically renewed, the still running Ingress pods still had the original certificates loaded:
ck@linux ~ $ kubectl -n ingress-nginx get pod
NAME READY STATUS RESTARTS AGE
default-http-backend-6977475d9b-l8dcz 1/1 Running 0 441d
nginx-ingress-controller-66pr2 1/1 Running 0 441d
nginx-ingress-controller-lkg28 1/1 Running 0 441d
nginx-ingress-controller-zzsvg 1/1 Running 0 441d
We can see all Ingress pods have been running for 441 days (without a restart). Let's remove one pod after another:
ck@linux ~ $ kubectl -n ingress-nginx delete pod nginx-ingress-controller-66pr2
pod "nginx-ingress-controller-66pr2" deleted
As "nginx-ingress-controller" is a Daemon Set which deploys one pod per node, Kubernetes should detect the "missing" pod and create a new one.
This happened a few seconds after the pod was deleted. Shortly after this, I verified the certificate on the port 443 again:
root@rancher01:~# /usr/lib/nagios/plugins/check_http -I 127.0.0.1 -p 443 -C 30,14
OK - Certificate 'Kubernetes Ingress Controller Fake Certificate' will expire on Sat 27 Jan 2024 09:50:14 AM GMT +0000.
A new certificate (with a one year expiry) is now in place.
The same was done for each pod on each node and all the Ingress certificates were valid again.
Besides checking the Rancher Ingress controller certificates (running on port 443 by default, sometimes also on port 8443 on a single Docker installation) it is also worth to check the Kubernetes (API) certificates on port 6443.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder