While upgrading a downstream Kubernetes cluster in a Rancher 2.8.3 environment (Single Docker Install), the Kubernetes upgrade process of the downstream cluster crashed in the middle.
The rancher container exited with the following error:
Trace[1742245805]: [1m0.025463487s] [1m0.025463487s] END
E0601 06:22:56.095403 47 reflector.go:147] pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229: Failed to watch *v1.Node: failed to list *v1.Node: the server was unable to return a response in the time allotted, but may still be processing the request (get nodes.meta.k8s.io)
W0601 06:22:56.130746 47 reflector.go:535] pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229: failed to list *v1.ConfigMap: the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps.meta.k8s.io)
I0601 06:22:56.131061 47 trace.go:236] Trace[24473488]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229 (01-Jun-2024 06:21:56.095) (total time: 60035ms):
Trace[24473488]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps.meta.k8s.io) 60035ms (06:22:56.130)
Trace[24473488]: [1m0.035779536s] [1m0.035779536s] END
E0601 06:22:56.132255 47 reflector.go:147] pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps.meta.k8s.io)
W0601 06:22:56.243505 47 reflector.go:535] pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229: failed to list *v1.APIService: the server was unable to return a response in the time allotted, but may still be processing the request (get apiservices.meta.k8s.io)
I0601 06:22:56.243736 47 trace.go:236] Trace[719701115]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229 (01-Jun-2024 06:21:56.095) (total time: 60147ms):
Trace[719701115]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get apiservices.meta.k8s.io) 60147ms (06:22:56.243)
Trace[719701115]: [1m0.147816221s] [1m0.147816221s] END
E0601 06:22:56.243987 47 reflector.go:147] pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229: Failed to watch *v1.APIService: failed to list *v1.APIService: the server was unable to return a response in the time allotted, but may still be processing the request (get apiservices.meta.k8s.io)
2024/06/01 06:22:57 [INFO] [planner] rkecluster fleet-default/vp-uat: configuring bootstrap node(s) custom-203e2ddaf654: waiting for probes: kube-apiserver
2024/06/01 06:22:58 httputil: ReverseProxy read error during body copy: unexpected EOF
2024/06/01 06:22:58 [ERROR] error syncing 'rancher-charts': handler bootstrap-charts: Delete "https://127.0.0.1:6444/api/v1/namespaces/rancher-operator-system": read tcp 127.0.0.1:35246->127.0.0.1:6444: read: connection reset by peer, requeuing
2024/06/01 06:22:58 [ERROR] error syncing 'cluster/fleet-default/vp-uat': handler auth-prov-v2-roletemplate: failed to update fleet-default/crt-vp-uat-nodes-manage rbac.authorization.k8s.io/v1, Kind=Role for auth-prov-v2-roletemplate-vp-uat nodes-manage: Patch "https://127.0.0.1:6444/apis/rbac.authorization.k8s.io/v1/namespaces/fleet-default/roles/crt-vp-uat-nodes-manage?timeout=15m0s": read tcp 127.0.0.1:35246->127.0.0.1:6444: read: connection reset by peer, requeuing
2024/06/01 06:22:58 [ERROR] error syncing 'fleet-default/vp-uat-managed-system-upgrade-controller': handler mcc-bundle: Put "https://127.0.0.1:6444/apis/management.cattle.io/v3/namespaces/fleet-default/managedcharts/vp-uat-managed-system-upgrade-controller/status": read tcp 127.0.0.1:35246->127.0.0.1:6444: read: connection reset by peer, requeuing
2024/06/01 06:22:59 httputil: ReverseProxy read error during body copy: unexpected EOF
2024/06/01 06:22:59 httputil: ReverseProxy read error during body copy: unexpected EOF
2024/06/01 06:22:59 httputil: ReverseProxy read error during body copy: unexpected EOF
2024/06/01 06:22:59 httputil: ReverseProxy read error during body copy: unexpected EOF
2024/06/01 06:22:59 [ERROR] error syncing 'cluster/fleet-default/vp-uat': handler auth-prov-v2-roletemplate: failed to update fleet-default/crt-vp-uat-cluster-member rbac.authorization.k8s.io/v1, Kind=Role for auth-prov-v2-roletemplate-vp-uat cluster-member: Patch "https://127.0.0.1:6444/apis/rbac.authorization.k8s.io/v1/namespaces/fleet-default/roles/crt-vp-uat-cluster-member?timeout=15m0s": read tcp 127.0.0.1:48326->127.0.0.1:6444: read: connection reset by peer, requeuing
2024/06/01 06:22:59 [FATAL] k3s exited with: exit status 1
The Rancher container was automatically restarted (due to the --restart=unless-stopped flag) and the Kubernetes upgrade of the downstream cluster continued... but was then stuck again with such error messages:
024/06/01 06:29:14 [INFO] [planner] rkecluster fleet-default/vp-uat: custom-203e2ddaf654
2024/06/01 06:29:14 [INFO] [planner] rkecluster fleet-default/vp-uat: configuring etcd node(s) custom-6364ff527a36: waiting for probes: calico
2024/06/01 06:29:17 [ERROR] error syncing 'rancher-charts': handler helm-clusterrepo-ensure: ensure failure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721 reset --hard FETCH_HEAD error: exit status 128, detail: fatal: Unable to create '/var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721/.git/index.lock': File exists.
Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
, handler helm-clusterrepo-download: update failure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721 reset --hard HEAD error: exit status 128, detail: fatal: Unable to create '/var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721/.git/index.lock': File exists.
Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
It seems that prior to the crash, the git operation was started (but not finished) and created a lock file. To continue the Kubernetes upgrade of the downstream cluster, this lock file needs to be removed from within the Rancher container:
root@rancher:~# docker exec -it rancher /bin/bash
1852ace9c2d6:/var/lib/rancher # rm /var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721/.git/index.lock
1852ace9c2d6:/var/lib/rancher # exit
The downstream cluster upgrade then continued and eventually finished (without another crash of the Rancher container) after a few minutes.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder