Rancher 2.8 Kubernetes Upgrade on downstream cluster failed (another git process seems to be running)

Written by - 0 comments

Published on - Listed in Kubernetes Cloud Rancher Containers Docker


While upgrading a downstream Kubernetes cluster in a Rancher 2.8.3 environment (Single Docker Install), the Kubernetes upgrade process of the downstream cluster crashed in the middle.

The rancher container exited with the following error:

Trace[1742245805]: [1m0.025463487s] [1m0.025463487s] END
E0601 06:22:56.095403      47 reflector.go:147] pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229: Failed to watch *v1.Node: failed to list *v1.Node: the server was unable to return a response in the time allotted, but may still be processing the request (get nodes.meta.k8s.io)
W0601 06:22:56.130746      47 reflector.go:535] pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229: failed to list *v1.ConfigMap: the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps.meta.k8s.io)
I0601 06:22:56.131061      47 trace.go:236] Trace[24473488]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229 (01-Jun-2024 06:21:56.095) (total time: 60035ms):
Trace[24473488]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps.meta.k8s.io) 60035ms (06:22:56.130)
Trace[24473488]: [1m0.035779536s] [1m0.035779536s] END
E0601 06:22:56.132255      47 reflector.go:147] pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps.meta.k8s.io)
W0601 06:22:56.243505      47 reflector.go:535] pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229: failed to list *v1.APIService: the server was unable to return a response in the time allotted, but may still be processing the request (get apiservices.meta.k8s.io)
I0601 06:22:56.243736      47 trace.go:236] Trace[719701115]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229 (01-Jun-2024 06:21:56.095) (total time: 60147ms):
Trace[719701115]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get apiservices.meta.k8s.io) 60147ms (06:22:56.243)
Trace[719701115]: [1m0.147816221s] [1m0.147816221s] END
E0601 06:22:56.243987      47 reflector.go:147] pkg/mod/github.com/rancher/client-go@v1.28.6-rancher1/tools/cache/reflector.go:229: Failed to watch *v1.APIService: failed to list *v1.APIService: the server was unable to return a response in the time allotted, but may still be processing the request (get apiservices.meta.k8s.io)
2024/06/01 06:22:57 [INFO] [planner] rkecluster fleet-default/vp-uat: configuring bootstrap node(s) custom-203e2ddaf654: waiting for probes: kube-apiserver
2024/06/01 06:22:58 httputil: ReverseProxy read error during body copy: unexpected EOF
2024/06/01 06:22:58 [ERROR] error syncing 'rancher-charts': handler bootstrap-charts: Delete "https://127.0.0.1:6444/api/v1/namespaces/rancher-operator-system": read tcp 127.0.0.1:35246->127.0.0.1:6444: read: connection reset by peer, requeuing
2024/06/01 06:22:58 [ERROR] error syncing 'cluster/fleet-default/vp-uat': handler auth-prov-v2-roletemplate: failed to update fleet-default/crt-vp-uat-nodes-manage rbac.authorization.k8s.io/v1, Kind=Role for auth-prov-v2-roletemplate-vp-uat nodes-manage: Patch "https://127.0.0.1:6444/apis/rbac.authorization.k8s.io/v1/namespaces/fleet-default/roles/crt-vp-uat-nodes-manage?timeout=15m0s": read tcp 127.0.0.1:35246->127.0.0.1:6444: read: connection reset by peer, requeuing
2024/06/01 06:22:58 [ERROR] error syncing 'fleet-default/vp-uat-managed-system-upgrade-controller': handler mcc-bundle: Put "https://127.0.0.1:6444/apis/management.cattle.io/v3/namespaces/fleet-default/managedcharts/vp-uat-managed-system-upgrade-controller/status": read tcp 127.0.0.1:35246->127.0.0.1:6444: read: connection reset by peer, requeuing
2024/06/01 06:22:59 httputil: ReverseProxy read error during body copy: unexpected EOF
2024/06/01 06:22:59 httputil: ReverseProxy read error during body copy: unexpected EOF
2024/06/01 06:22:59 httputil: ReverseProxy read error during body copy: unexpected EOF
2024/06/01 06:22:59 httputil: ReverseProxy read error during body copy: unexpected EOF
2024/06/01 06:22:59 [ERROR] error syncing 'cluster/fleet-default/vp-uat': handler auth-prov-v2-roletemplate: failed to update fleet-default/crt-vp-uat-cluster-member rbac.authorization.k8s.io/v1, Kind=Role for auth-prov-v2-roletemplate-vp-uat cluster-member: Patch "https://127.0.0.1:6444/apis/rbac.authorization.k8s.io/v1/namespaces/fleet-default/roles/crt-vp-uat-cluster-member?timeout=15m0s": read tcp 127.0.0.1:48326->127.0.0.1:6444: read: connection reset by peer, requeuing
2024/06/01 06:22:59 [FATAL] k3s exited with: exit status 1

The Rancher container was automatically restarted (due to the --restart=unless-stopped flag) and the Kubernetes upgrade of the downstream cluster continued... but was then stuck again with such error messages:

024/06/01 06:29:14 [INFO] [planner] rkecluster fleet-default/vp-uat: custom-203e2ddaf654
2024/06/01 06:29:14 [INFO] [planner] rkecluster fleet-default/vp-uat: configuring etcd node(s) custom-6364ff527a36: waiting for probes: calico
2024/06/01 06:29:17 [ERROR] error syncing 'rancher-charts': handler helm-clusterrepo-ensure: ensure failure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721 reset --hard FETCH_HEAD error: exit status 128, detail: fatal: Unable to create '/var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
, handler helm-clusterrepo-download: update failure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721 reset --hard HEAD error: exit status 128, detail: fatal: Unable to create '/var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721/.git/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.

It seems that prior to the crash, the git operation was started (but not finished) and created a lock file. To continue the Kubernetes upgrade of the downstream cluster, this lock file needs to be removed from within the Rancher container:

root@rancher:~# docker exec -it rancher  /bin/bash
1852ace9c2d6:/var/lib/rancher # rm /var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721/.git/index.lock
1852ace9c2d6:/var/lib/rancher # exit

The downstream cluster upgrade then continued and eventually finished (without another crash of the Rancher container) after a few minutes.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder