Reset (clean up) a Rancher Docker Host / Kubernetes Node

Written by - 0 comments

Published on - last updated on July 10th 2023 - Listed in Linux Virtualization Containers Rancher Cloud Kubernetes


These days I'm testing Rancher as a potential candidate for a new Docker infrastructure. It's appealing so far: Rancher does have a nice and intuitive user interface and more importantly a nice API to automatically trigger container creation (for example through Travis).

During a fail over test, I rebooted one of the Rancher hosts and when it came back up, the connectivity to Rancher was lost. Why? Because I forgot to add the separate file system for /var/lib/docker, which I prepared as a logical volume, into /etc/fstab - therefore all previous docker data was gone and of course also the rancher-agent container.

Unfortunately I didn't see the error as fast and I just decided to simply remove the host in Rancher and re-add it manually. Of course when I fixed the file system mount problem and rebooted, Rancher would not connect anymore, because meanwhile there is a new rancher-agent with a new ID installed.

Clean up a Rancher 1.x host

To force a reset or cleanup of the Rancher host, one can do the following:

1. Deactivate the affected host in Rancher, then remove the host

2. Stop Docker service

service docker stop

3. Remove Docker and Rancher data:

rm -rf /var/lib/docker/*
rm -rf /var/lib/rancher/*

4. Start Docker service

service docker start

5. Add the host in Rancher

Clean up a Rancher 2.x Kubernetes node (2.0 - 2.5)

The above commands apply to a Rancher 1.x environment. In Rancher 2.x more directories must be cleaned up:

1. Deactivate (drain) the affected host in Rancher, then remove the host. Either in the Rancher UI or for the "local" cluster in RKE's YAML config.

2. Stop Docker service 

service docker stop

3. Remove Docker, Rancher, RKE and Kubernetes related data:

mount|grep kubelet | awk '{print $3}' | while read mount; do umount $mount; done
rm -rf /var/lib/docker/*
rm -rf /var/lib/rancher/*
rm -rf /var/lib/etcd
rm -rf /var/lib/kubelet/*
rm -rf /etc/kubernetes
rm -rf /etc/cni
rm -rf /opt/cni
rm -rf /var/lib/cni
rm -rf /var/run/calico
rm -rf /run/secrets/kubernetes.io
test -d /opt/rancher && rm -rf /opt/rancher # For Single Rancher installs
test -d /opt/containerd && rm -rf /opt/containerd
test -d /opt/rke && rm -rf /opt/rke

4. Restart Docker service

service docker restart

Yes, although the Docker service was previously stopped, a simple "start" does not re-create the directories within /var/lib/docker (since Docker 20.10.x; see article Docker unable to pull images after-clean up for more information):

root@node:~# service docker start
root@node:~# ll /var/lib/docker/
total 0

A service restart however re-creates the missing directories:

root@node:~# service docker restart
root@node:~# ll /var/lib/docker/
total 44
drwx--x--x 4 root root 4096 Nov 11 14:06 buildkit
drwx--x--- 2 root root 4096 Nov 11 14:06 containers
drwx------ 3 root root 4096 Nov 11 14:06 image
drwxr-x--- 3 root root 4096 Nov 11 14:06 network
drwx--x--- 3 root root 4096 Nov 11 14:06 overlay2
drwx------ 4 root root 4096 Nov 11 14:06 plugins
drwx------ 2 root root 4096 Nov 11 14:06 runtimes
drwx------ 2 root root 4096 Nov 11 14:06 swarm
drwx------ 2 root root 4096 Nov 11 14:06 tmp
drwx------ 2 root root 4096 Nov 11 14:06 trust
drwx-----x 2 root root 4096 Nov 11 14:06 volumes

5. Add the host into a cluster using the sudo docker... command (shown in Rancher UI) or in RKE YAML

Clean up a Rancher 2.7 Kubernetes node

[... in progress, to be verified ... ]

Kubernetes nodes in Rancher managed downstream clusters run containers with their own deployment of containerd. The binaries are located in /var/lib/rancher/rke2/bin. These are not installed through the system package repositories.

To reset a Rancher 2.7 downstream cluster node, use the following steps.

1. Deactivate (drain) the affected host in Rancher, then delete the node. Either in the Rancher UI or for the "local" cluster in RKE's YAML config.

2. Stop RKE2 and Rancher-System service, delete related Systemd service units

systemctl stop rke2-server.service
systemctl stop rancher-system-agent.service
rm -f /etc/systemd/system/rancher-system*
rm -f /usr/local/lib/systemd/system/rke2-server.service
systemctl daemon-reload

This should (hopefully) stop all the containers (TO BE VERIFIED).

3. Remove Rancher, RKE and Kubernetes related data:

mount|grep kubelet | awk '{print $3}' | while read mount; do umount $mount; done
test -d /var/lib/docker && rm -rf /var/lib/docker/*
rm -rf /var/lib/rancher/*
rm -rf /var/lib/etcd
rm -rf /var/lib/kubelet/*
rm -rf /etc/kubernetes
rm -rf /etc/cni
rm -rf /opt/cni
rm -rf /var/lib/cni
rm -rf /var/run/calico
rm -rf /run/secrets/kubernetes.io
test -d /opt/rancher && rm -rf /opt/rancher # For Single Rancher installs
test -d /opt/containerd && rm -rf /opt/containerd
test -d /opt/rke && rm -rf /opt/rke

4. Reboot

reboot

Reboot the node and verify no containerd-shim-runc-v2 processes are running.

Looking for a managed dedicated Kubernetes environment?

If you are looking for a managed and dedicated Kubernetes environment, managed by Rancher 2, with server location Switzerland, check out our Private Kubernetes Container Cloud Infrastructure service.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder