After DNS resolvers were replaced and a new /etc/resolv.conf with one new entry was pushed to all Kubernetes nodes, we were surprised to run into massive DNS issues inside Kubernetes once the old DNS servers were powered down. Why?
The first step was to verify whether the node's /etc/resolv.conf was really updated. And yes, the new DNS servers are showing up:
ck@kubenode:~$ cat /etc/resolv.conf
search example.com
nameserver 10.100.100.53
nameserver 10.100.100.153
But firewall logs still showed DNS requests being sent to old DNS servers. With tcpdump this could be confirmed:
root@kubenode:~# tcpdump -i ens192 -nn port 53 -X
17:11:22.132590 IP 10.150.79.17.35100 > 10.100.100.53.53: 58552+ A? app.example.com. (35)
[...]
17:11:22.134434 IP 10.100.100.53.53 > 10.150.79.17.35100: 58552 1/0/0 A 1.1.1.1 (51)
[...]
17:11:23.338460 IP 10.150.79.17.54300 > 10.17.0.10.53: 44418+ A? cdn.example.com.
[...]
Here we can see that one DNS query went to one of the old (and now powered down) nameserver 10.17.0.10.
Is the Docker daemon responsible for holding this old dns server in somewhat of a DNS cache? But why? Shouldn't Kubernetes' own CoreDNS use /etc/resolv.conf as forwarder?
In this Kubernetes cluster, we have deployed NodeLocal DNSCache, to decrease DNS response times. Instead of going to one (or multiple) central CoreDNS pods, each Kubernetes node runs its own little CoreDNS daemon. With some iptables magic, the Rancher managed Kubernetes cluster's DNS IP (10.43.0.10) is redirected to the local CoreDNS pod with an exposed IP 169.254.20.10:
root@kubenode:~# iptables -nvL |grep 169
0 0 ACCEPT udp -- * * 0.0.0.0/0 169.254.20.10 udp dpt:53
0 0 ACCEPT tcp -- * * 0.0.0.0/0 169.254.20.10 tcp dpt:53
0 0 ACCEPT udp -- * * 169.254.20.10 0.0.0.0/0 udp spt:53
0 0 ACCEPT tcp -- * * 169.254.20.10 0.0.0.0/0 tcp spt:53
When execing into the k8s_node-cache_node-local-dns pod, we can verify the /etc/resolv.conf here:
root@kubenode:~# docker exec -it 32e056921c9d /bin/sh
# cat /etc/resolv.conf
nameserver 10.100.100.53
nameserver 10.17.0.10
search example.com
Here it is! The /etc/resolv.conf file inside the Node Local DNS container still shows the outdated DNS server 10.17.0.10! While we're in this container, we can check on the CoreDNS config, too:
# cat /etc/coredns/Corefile.base
cluster.local:53 {
errors
cache {
success 9984 5
denial 9984 5
}
reload
loop
bind 169.254.20.10 10.43.0.10
forward . __PILLAR__CLUSTER__DNS__ {
force_tcp
}
prometheus :9253
health 169.254.20.10:8080
}
in-addr.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 10.43.0.10
forward . /etc/resolv.conf {
force_tcp
}
prometheus :9253
}
ip6.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 10.43.0.10
forward . /etc/resolv.conf {
force_tcp
}
prometheus :9253
}
.:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 10.43.0.10
forward . /etc/resolv.conf
prometheus :9253
}
# exit
Looking at this CoreDNS config shows that the servers found in /etc/resolv.conf are used as forwarders. With the tcpdump output seen above, the DNS requests seem to be sent round-robin style to all the forwarders.
Most importantly we now know that: Yes, CoreDNS (both standard or Node Local implementation) uses /etc/resolv.conf as forwarders, however this is a /etc/resolv.conf inside the container and not the one from the Node.
How do we get this updated /etc/resolv.conf from the node into the pods now?
A restart / re-deployment of the NodeLocal DNS pods did not help; the outdated /etc/resolv.conf was still showing up.
Further research led to an interesting bug report on the CoreDNS repository. The description from @kristapsm matched our experience with the outdated /etc/resolv.conf:
I am using Rancher, so basically the issue happened when my nameservers IP got changed and Rancher nodes were not able to ping anything outside like google.com etc, when I changed nameservers in /etc/resolv.conf, nodes were able to ping google.com, but containers running on Rancher were still not able to ping anything externally.
Following this report, the issue is not CoreDNS itself, but rather how CoreDNS is updated by the Kubernetes implementation. And this turns out, it's the kubelet's responsibility to do this:
When a Pod is created, the Pod's /etc/resolv.conf is determined by kubelet, based on the Pod's dnsPolicy. In standard K8s deployments, CoreDNS's dnsPolicy is "Default", which means kubelet passes the node's /etc/resolv.conf (or other path if so configured in kubelet). Rancher may be configuring kubelet to use a different path than /etc/resolv.conf...
In a Rancher managed Kubernetes cluster, kubelet is started as a Docker container (in kubeadm implementation this would be a systemd service). Let's take a look into the kubelet container, shall we?
root@kubenode:~# docker exec -it kubelet /bin/bash
root@kubenode:/# cat /etc/resolv.conf
search example.com
nameserver 10.100.100.53
nameserver 10.17.0.10
root@kubenode:/# exit
Here we go again, the old DNS server 10.17.0.10 is still present here, inside the kubelet container!
This means we can restart or delete as many CoreDNS pods as we want - they will be recreated by kubelet with the /etc/resolv.conf from the kubelet container!
What if we restart kubelet now?
root@kubenode:~# ps auxf|grep kubelet
root 989032 0.0 0.0 9032 740 pts/0 S+ 17:37 0:00 \_ grep --color=auto kubelet
root 760601 8.4 0.5 2512856 144928 ? Ssl 15:41 9:52 \_ kubelet --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --volume-plugin-dir=/var/lib/kubelet/volumeplugins --cni-bin-dir=/opt/cni/bin --streaming-connection-idle-timeout=30m --fail-swap-on=false --network-plugin=cni --event-qps=0 --address=0.0.0.0 --read-only-port=0 --make-iptables-util-chains=true --client-ca-file=/etc/kubernetes/ssl/kube-ca.pem --cloud-provider= --kubeconfig=/etc/kubernetes/ssl/kubecfg-kube-node.yaml --root-dir=/var/lib/kubelet --anonymous-auth=false --cni-conf-dir=/etc/cni/net.d --cgroups-per-qos=True --hostname-override=onl-radoi17-p --pod-infra-container-image=rancher/mirrored-pause:3.2 --authentication-token-webhook=true --resolv-conf=/etc/resolv.conf --v=2 --authorization-mode=Webhook --cgroup-driver=cgroupfs
root@kubenode:~# docker restart kubelet
kubelet
root@kubenode:~# docker exec -it kubelet /bin/bash
root@kubenode:/# cat /etc/resolv.conf
search example.com
nameserver 10.100.100.53
namserver 10.100.100.153
root@kubenode:/# exit
Wow, finally the old DNS server is gone, the /etc/resolv.conf is now updated!
But for now only kubelet has this new /etc/resolv.conf, we still need the NodeLocal DNSCache pods recreated so that they get the new resolv.conf, too (which will be used as CoreDNS forwarders, remember?).
There are a couple of ways to do this now. An easy method is to use the Rancher UI, navigate to the System project and redeploy the workloads node-loacal-dns and coredns workload from there.
Or on the Kubernetes node itself, restart the container starting with k8s_POD_node-local-dns:
root@kubenode:~# docker restart $(docker ps | grep k8s_POD_node-local-dns | awk '{print $1}')
This container is responsible for updating the actual NodeLocal CoreDNS container (starting with k8s_node-cache_node-local-dns):
root@kubenode:~# docker ps | grep local
dccc71394e0e 5bae806f8f12 "/node-cache -localiā¦" 16 hours ago Up 16 hours k8s_node-cache_node-local-dns-dhfz2_kube-system_bf38b26c-aba9-4279-8865-1d87e8f9bdd1_0
f31676e7c8b8 rancher/mirrored-pause:3.2 "/pause" 16 hours ago Up 30 minutes k8s_POD_node-local-dns-dhfz2_kube-system_bf38b26c-aba9-4279-8865-1d87e8f9bdd1_0
Since the k8s_POD_node-local-dns was restarted (according to the output 30min ago), let's check the /etc/resolv.conf contents of the k8s_node-cache_node-local-dns container:
root@kubenode:~# docker exec -it $(docker ps | grep k8s_node-cache_node-local-dns | awk '{print $1}') cat /etc/resolv.conf
search example.com
nameserver 10.100.100.53
namserver 10.100.100.153
Hurray, the /etc/resolv.conf is updated and therefore CoreDNS will now use the correct DNS forwarders!
When the DNS resolvers are changed and pushed to a Rancher managed Kubernetes node, it's not enough to simply enter the new DNS servers into the node's /etc/resolv.conf. Here's the order you should do:
Carlos from wrote on Jul 1st, 2024:
A third option to restart the NodeLocal DNS cache pods is to restart the daemonset:
kubectl -n kube-system rollout restart daemonset node-local-dns
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder