GlusterFS service not starting after system reboot (Initialization of volume management failed)

Written by - 0 comments

Published on - Listed in GlusterFS Linux Python Ansible


In a few clustered application environments I successfully use GlusterFS as underlying file system. Data written into that filesystem is immediately replicated to other GlusterFS nodes (depending on the replica setting of the Gluster volume).

GlusterFS

This setup has been working great - until recently when the GlusterFS daemon (glusterd) would not start anymore after a system (Ubuntu) reboot. What happened?

Job for glusterd.service failed

In certain situations GlusterFS might not start up correctly when the Gluster service/daemon (glusterd) is started before the networking services are fully working. The node is then not able to communicate with the GlusterFS cluster and hence fails to start. This was my initial thought, so once connected to this machine I simply tried another start of glusterd:

root@glusternode:~# systemctl start glusterd
Job for glusterd.service failed because the control process exited with error code.
See "systemctl status glusterd.service" and "journalctl -xe" for details.

Umm... So that was not it. The glusterd service is still not able to start.

Unfortunately the hinted journalctl -xe does not show anything helpful, so I checked the glusterfs logs while attempting another start of the service:

root@glusternode:~# tail -f /var/log/glusterfs/glusterd.log
[2024-09-26 04:56:16.903611] W [MSGID: 103071] [rdma.c:4631:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device]
[2024-09-26 04:56:16.903790] W [MSGID: 103055] [rdma.c:4940:init] 0-rdma.management: Failed to initialize IB Device
[2024-09-26 04:56:16.903917] W [rpc-transport.c:350:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2024-09-26 04:56:16.904176] W [rpcsvc.c:1770:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2024-09-26 04:56:16.904272] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2024-09-26 04:56:16.960877] E [MSGID: 106229] [glusterd.c:525:glusterd_check_gsync_present] 0-glusterd: geo-replication module not working as desired
[2024-09-26 04:56:16.961206] E [MSGID: 101019] [xlator.c:503:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2024-09-26 04:56:16.961297] E [MSGID: 101066] [graph.c:327:glusterfs_graph_init] 0-management: initializing translator failed
[2024-09-26 04:56:16.961362] E [MSGID: 101176] [graph.c:698:glusterfs_graph_activate] 0-graph: init failed
[2024-09-26 04:56:16.963224] W [glusterfsd.c:1393:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xf7) [0x55c5d2a5fff7] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x193) [0x55c5d2a5feb3] -->/usr/sbin/glusterd(cleanup_and_exit+0x5a) [0x55c5d2a5c44a] ) 0-: received signum (-1), shutting down

The glusterd.log basically says: Glusterd was unable to start. Wow. Thanks. But why exactly the daemon is starting to fail is incomprehensible from this output.

During my online research I came across an issue on the glusterfs GitHub repo. This issue describes the exact same issue as I'm experiencing on this Ubuntu machine. But the ticket is (as of now) still open and no solution was presented. On another page I came across an important hint to start glusterd with the --debug parameter.

Debug log shows Python print error

The log files in /var/log/glusterfs only contain I(nformation), W(arning) and E(rror) entries. Maybe the D(ebug) level would help? For this I started the glusterd binary with the --debug flag. Here's the full output:

root@glusternode:~# glusterd --debug
[2024-09-26 04:59:58.306799] I [MSGID: 100030] [glusterfsd.c:2556:main] 0-glusterd: Started running glusterd version 3.13.2 (args: glusterd --debug)
[2024-09-26 04:59:58.307173] D [logging.c:1817:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2024-09-26 04:59:58.308589] D [MSGID: 0] [glusterfsd.c:683:get_volfp] 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol
[2024-09-26 04:59:58.322335] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536
[2024-09-26 04:59:58.322738] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory
[2024-09-26 04:59:58.324701] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory
[2024-09-26 04:59:58.325225] D [MSGID: 0] [glusterd.c:476:glusterd_rpcsvc_options_build] 0-glusterd: listen-backlog value: 10
[2024-09-26 04:59:58.325537] D [rpcsvc.c:2512:rpcsvc_init] 0-rpc-service: RPC service inited.
[2024-09-26 04:59:58.325774] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0
[2024-09-26 04:59:58.326045] D [rpc-transport.c:279:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib/x86_64-linux-gnu/glusterfs/3.13.2/rpc-transport/socket.so
[2024-09-26 04:59:58.331182] D [socket.c:4236:socket_init] 0-socket.management: Configued transport.tcp-user-timeout=0
[2024-09-26 04:59:58.331510] D [socket.c:4254:socket_init] 0-socket.management: Reconfigued transport.keepalivecnt=9
[2024-09-26 04:59:58.331754] D [socket.c:4339:socket_init] 0-socket.management: SSL support on the I/O path is NOT enabled
[2024-09-26 04:59:58.332007] D [socket.c:4342:socket_init] 0-socket.management: SSL support for glusterd is NOT enabled
[2024-09-26 04:59:58.332250] D [socket.c:4359:socket_init] 0-socket.management: using system polling thread
[2024-09-26 04:59:58.332498] D [name.c:598:server_fill_address_family] 0-socket.management: option address-family not specified, defaulting to inet
[2024-09-26 04:59:58.333000] D [rpc-transport.c:279:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib/x86_64-linux-gnu/glusterfs/3.13.2/rpc-transport/rdma.so
[2024-09-26 04:59:58.336458] W [MSGID: 103071] [rdma.c:4631:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device]
[2024-09-26 04:59:58.336856] W [MSGID: 103055] [rdma.c:4940:init] 0-rdma.management: Failed to initialize IB Device
[2024-09-26 04:59:58.337211] W [rpc-transport.c:350:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2024-09-26 04:59:58.337784] W [rpcsvc.c:1770:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2024-09-26 04:59:58.338136] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2024-09-26 04:59:58.338525] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc peer, Num: 1238437, Ver: 2, Port: 0
[2024-09-26 04:59:58.338846] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc cli read-only, Num: 1238463, Ver: 2, Port: 0
[2024-09-26 04:59:58.339257] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc mgmt, Num: 1238433, Ver: 2, Port: 0
[2024-09-26 04:59:58.339583] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc mgmt v3, Num: 1238433, Ver: 3, Port: 0
[2024-09-26 04:59:58.339886] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Portmap, Num: 34123456, Ver: 1, Port: 0
[2024-09-26 04:59:58.340277] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Handshake, Num: 14398633, Ver: 2, Port: 0
[2024-09-26 04:59:58.340604] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster MGMT Handshake, Num: 1239873, Ver: 1, Port: 0
[2024-09-26 04:59:58.341030] D [rpcsvc.c:2512:rpcsvc_init] 0-rpc-service: RPC service inited.
[2024-09-26 04:59:58.341436] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0
[2024-09-26 04:59:58.341807] D [rpc-transport.c:279:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib/x86_64-linux-gnu/glusterfs/3.13.2/rpc-transport/socket.so
[2024-09-26 04:59:58.342361] D [socket.c:4197:socket_init] 0-socket.management: disabling nodelay
[2024-09-26 04:59:58.342775] D [socket.c:4236:socket_init] 0-socket.management: Configued transport.tcp-user-timeout=0
[2024-09-26 04:59:58.343149] D [socket.c:4254:socket_init] 0-socket.management: Reconfigued transport.keepalivecnt=9
[2024-09-26 04:59:58.343483] D [socket.c:4339:socket_init] 0-socket.management: SSL support on the I/O path is NOT enabled
[2024-09-26 04:59:58.343780] D [socket.c:4342:socket_init] 0-socket.management: SSL support for glusterd is NOT enabled
[2024-09-26 04:59:58.344085] D [socket.c:4359:socket_init] 0-socket.management: using system polling thread
[2024-09-26 04:59:58.344571] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc cli, Num: 1238463, Ver: 2, Port: 0
[2024-09-26 04:59:58.344905] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Handshake (CLI Getspec), Num: 14398633, Ver: 2, Port: 0
[2024-09-26 04:59:58.345298] D [MSGID: 0] [glusterd-utils.c:7544:glusterd_sm_tr_log_init] 0-glusterd: returning 0
[2024-09-26 04:59:58.345626] D [MSGID: 0] [glusterd.c:1855:init] 0-management: cannot get run-with-valgrind value
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 461
    print svol_uuid
                  ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(svol_uuid)?

[2024-09-26 04:59:58.398774] E [MSGID: 106229] [glusterd.c:525:glusterd_check_gsync_present] 0-glusterd: geo-replication module not working as desired
[2024-09-26 04:59:58.399350] D [MSGID: 0] [glusterd.c:535:glusterd_check_gsync_present] 0-glusterd: Returning -1
[2024-09-26 04:59:58.399683] E [MSGID: 101019] [xlator.c:503:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2024-09-26 04:59:58.399939] E [MSGID: 101066] [graph.c:327:glusterfs_graph_init] 0-management: initializing translator failed
[2024-09-26 04:59:58.400204] E [MSGID: 101176] [graph.c:698:glusterfs_graph_activate] 0-graph: init failed
[2024-09-26 04:59:58.401297] D [logging.c:1791:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 5 extra log messages
[2024-09-26 04:59:58.401572] D [logging.c:1794:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 5 extra log messages
[2024-09-26 04:59:58.402081] W [glusterfsd.c:1393:cleanup_and_exit] (-->glusterd(glusterfs_volumes_init+0xf7) [0x559b83183ff7] -->glusterd(glusterfs_process_volfp+0x193) [0x559b83183eb3] -->glusterd(cleanup_and_exit+0x5a) [0x559b8318044a] ) 0-: received signum (-1), shutting down
[2024-09-26 04:59:58.402412] D [glusterfsd-mgmt.c:2691:glusterfs_mgmt_pmap_signout] 0-fsd-mgmt: portmapper signout arguments not given

The SyntaxError immediately caught my eye:

[2024-09-26 04:59:58.345626] D [MSGID: 0] [glusterd.c:1855:init] 0-management: cannot get run-with-valgrind value
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 461
    print svol_uuid
                  ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(svol_uuid)?

This must be a Python error! I worked on a couple of Python scripts before - and as I'm not that young anymore also experienced the switch from Python2 to Python3. The most significant syntax change from Python2 to Python3, at least to me, is the way the print function works.

Python2: print "Hello World\n" worked just fine

Python3: print("Hello World\n") requires parentheses

But why would this Python error be shown after the reboot? Has there been a change to the system?

Changing system wide python back to python2

And then I remembered! A few days prior to the system reboot an Ansible playbook (basically our base-configuration playbook, integrating the servers into monitoring, Netbox inventory, etc) was run. And inside this playbook there's a specific task to pin /usr/bin/python to python3 using update-alternatives:

  # Set alternatives for /usr/bin/python
  - name: BASE - Check if python3 is installed
    stat: path=/usr/bin/python3
    register: python3

  - name: BASE - Set alternatives for /usr/bin/python
    alternatives: name=python link=/usr/bin/python path=/usr/bin/python3
    when: python3.stat.exists == true and (ansible_distribution_release != "stretch")

This task is only executed if the /usr/bin/python3 binary exists and if the server is not a Debian Stretch (9). The reason is that python2 was the default Python version in Debian 9.

And the very same situation applies to this GlusterFS server, as it turns out to run on Ubuntu 18.04 (Bionic) - which uses python2 by default!

The playbook was adjusted, to not run into this problem in the future anymore (I probably won't as all Ubuntu 18.04 are EOL and must be upgraded), and to ignore "bionic" releases.

To fix the Python version (reverting back to the default Python2) on this GlusterFS node I could follow my own how to point /usr/bin/python to a specific version:

root@glusternode:~# which python
/usr/bin/python

root@glusternode:~# /usr/bin/python -V
Python 3.6.9

root@glusternode:~# update-alternatives --install /usr/bin/python python /usr/bin/python2 1

root@glusternode:~# update-alternatives --list python
/usr/bin/python2
/usr/bin/python3

root@glusternode:~# update-alternatives --remove python /usr/bin/python3
update-alternatives: removing manually selected alternative - switching python to auto mode
update-alternatives: using /usr/bin/python2 to provide /usr/bin/python (python) in auto mode

root@glusternode:~# update-alternatives --list python
/usr/bin/python2

root@glusternode:~# python -V
Python 2.7.17

Welcome back, GlusterFS cluster

After the Python version change, the glusterd service could be started again:

root@glusternode:~# systemctl start glusterd
root@glusternode:~# systemctl status glusterd
[...]
Sep 26 09:15:52 glusternode systemd[1]: Starting GlusterFS, a clustered file-system server...
Sep 26 09:15:56 glusternode systemd[1]: Started GlusterFS, a clustered file-system server.

And a few seconds later the GlusterFS volume could be mounted again.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder