In a few clustered application environments I successfully use GlusterFS as underlying file system. Data written into that filesystem is immediately replicated to other GlusterFS nodes (depending on the replica setting of the Gluster volume).
This setup has been working great - until recently when the GlusterFS daemon (glusterd) would not start anymore after a system (Ubuntu) reboot. What happened?
In certain situations GlusterFS might not start up correctly when the Gluster service/daemon (glusterd) is started before the networking services are fully working. The node is then not able to communicate with the GlusterFS cluster and hence fails to start. This was my initial thought, so once connected to this machine I simply tried another start of glusterd:
root@glusternode:~# systemctl start glusterd
Job for glusterd.service failed because the control process exited with error code.
See "systemctl status glusterd.service" and "journalctl -xe" for details.
Umm... So that was not it. The glusterd service is still not able to start.
Unfortunately the hinted journalctl -xe does not show anything helpful, so I checked the glusterfs logs while attempting another start of the service:
root@glusternode:~# tail -f /var/log/glusterfs/glusterd.log
[2024-09-26 04:56:16.903611] W [MSGID: 103071] [rdma.c:4631:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device]
[2024-09-26 04:56:16.903790] W [MSGID: 103055] [rdma.c:4940:init] 0-rdma.management: Failed to initialize IB Device
[2024-09-26 04:56:16.903917] W [rpc-transport.c:350:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2024-09-26 04:56:16.904176] W [rpcsvc.c:1770:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2024-09-26 04:56:16.904272] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2024-09-26 04:56:16.960877] E [MSGID: 106229] [glusterd.c:525:glusterd_check_gsync_present] 0-glusterd: geo-replication module not working as desired
[2024-09-26 04:56:16.961206] E [MSGID: 101019] [xlator.c:503:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2024-09-26 04:56:16.961297] E [MSGID: 101066] [graph.c:327:glusterfs_graph_init] 0-management: initializing translator failed
[2024-09-26 04:56:16.961362] E [MSGID: 101176] [graph.c:698:glusterfs_graph_activate] 0-graph: init failed
[2024-09-26 04:56:16.963224] W [glusterfsd.c:1393:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xf7) [0x55c5d2a5fff7] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x193) [0x55c5d2a5feb3] -->/usr/sbin/glusterd(cleanup_and_exit+0x5a) [0x55c5d2a5c44a] ) 0-: received signum (-1), shutting down
The glusterd.log basically says: Glusterd was unable to start. Wow. Thanks. But why exactly the daemon is starting to fail is incomprehensible from this output.
During my online research I came across an issue on the glusterfs GitHub repo. This issue describes the exact same issue as I'm experiencing on this Ubuntu machine. But the ticket is (as of now) still open and no solution was presented. On another page I came across an important hint to start glusterd with the --debug parameter.
The log files in /var/log/glusterfs only contain I(nformation), W(arning) and E(rror) entries. Maybe the D(ebug) level would help? For this I started the glusterd binary with the --debug flag. Here's the full output:
root@glusternode:~# glusterd --debug
[2024-09-26 04:59:58.306799] I [MSGID: 100030] [glusterfsd.c:2556:main] 0-glusterd: Started running glusterd version 3.13.2 (args: glusterd --debug)
[2024-09-26 04:59:58.307173] D [logging.c:1817:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2024-09-26 04:59:58.308589] D [MSGID: 0] [glusterfsd.c:683:get_volfp] 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol
[2024-09-26 04:59:58.322335] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536
[2024-09-26 04:59:58.322738] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory
[2024-09-26 04:59:58.324701] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory
[2024-09-26 04:59:58.325225] D [MSGID: 0] [glusterd.c:476:glusterd_rpcsvc_options_build] 0-glusterd: listen-backlog value: 10
[2024-09-26 04:59:58.325537] D [rpcsvc.c:2512:rpcsvc_init] 0-rpc-service: RPC service inited.
[2024-09-26 04:59:58.325774] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0
[2024-09-26 04:59:58.326045] D [rpc-transport.c:279:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib/x86_64-linux-gnu/glusterfs/3.13.2/rpc-transport/socket.so
[2024-09-26 04:59:58.331182] D [socket.c:4236:socket_init] 0-socket.management: Configued transport.tcp-user-timeout=0
[2024-09-26 04:59:58.331510] D [socket.c:4254:socket_init] 0-socket.management: Reconfigued transport.keepalivecnt=9
[2024-09-26 04:59:58.331754] D [socket.c:4339:socket_init] 0-socket.management: SSL support on the I/O path is NOT enabled
[2024-09-26 04:59:58.332007] D [socket.c:4342:socket_init] 0-socket.management: SSL support for glusterd is NOT enabled
[2024-09-26 04:59:58.332250] D [socket.c:4359:socket_init] 0-socket.management: using system polling thread
[2024-09-26 04:59:58.332498] D [name.c:598:server_fill_address_family] 0-socket.management: option address-family not specified, defaulting to inet
[2024-09-26 04:59:58.333000] D [rpc-transport.c:279:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib/x86_64-linux-gnu/glusterfs/3.13.2/rpc-transport/rdma.so
[2024-09-26 04:59:58.336458] W [MSGID: 103071] [rdma.c:4631:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device]
[2024-09-26 04:59:58.336856] W [MSGID: 103055] [rdma.c:4940:init] 0-rdma.management: Failed to initialize IB Device
[2024-09-26 04:59:58.337211] W [rpc-transport.c:350:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2024-09-26 04:59:58.337784] W [rpcsvc.c:1770:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2024-09-26 04:59:58.338136] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2024-09-26 04:59:58.338525] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc peer, Num: 1238437, Ver: 2, Port: 0
[2024-09-26 04:59:58.338846] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc cli read-only, Num: 1238463, Ver: 2, Port: 0
[2024-09-26 04:59:58.339257] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc mgmt, Num: 1238433, Ver: 2, Port: 0
[2024-09-26 04:59:58.339583] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc mgmt v3, Num: 1238433, Ver: 3, Port: 0
[2024-09-26 04:59:58.339886] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Portmap, Num: 34123456, Ver: 1, Port: 0
[2024-09-26 04:59:58.340277] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Handshake, Num: 14398633, Ver: 2, Port: 0
[2024-09-26 04:59:58.340604] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster MGMT Handshake, Num: 1239873, Ver: 1, Port: 0
[2024-09-26 04:59:58.341030] D [rpcsvc.c:2512:rpcsvc_init] 0-rpc-service: RPC service inited.
[2024-09-26 04:59:58.341436] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0
[2024-09-26 04:59:58.341807] D [rpc-transport.c:279:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib/x86_64-linux-gnu/glusterfs/3.13.2/rpc-transport/socket.so
[2024-09-26 04:59:58.342361] D [socket.c:4197:socket_init] 0-socket.management: disabling nodelay
[2024-09-26 04:59:58.342775] D [socket.c:4236:socket_init] 0-socket.management: Configued transport.tcp-user-timeout=0
[2024-09-26 04:59:58.343149] D [socket.c:4254:socket_init] 0-socket.management: Reconfigued transport.keepalivecnt=9
[2024-09-26 04:59:58.343483] D [socket.c:4339:socket_init] 0-socket.management: SSL support on the I/O path is NOT enabled
[2024-09-26 04:59:58.343780] D [socket.c:4342:socket_init] 0-socket.management: SSL support for glusterd is NOT enabled
[2024-09-26 04:59:58.344085] D [socket.c:4359:socket_init] 0-socket.management: using system polling thread
[2024-09-26 04:59:58.344571] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc cli, Num: 1238463, Ver: 2, Port: 0
[2024-09-26 04:59:58.344905] D [rpcsvc.c:2062:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Handshake (CLI Getspec), Num: 14398633, Ver: 2, Port: 0
[2024-09-26 04:59:58.345298] D [MSGID: 0] [glusterd-utils.c:7544:glusterd_sm_tr_log_init] 0-glusterd: returning 0
[2024-09-26 04:59:58.345626] D [MSGID: 0] [glusterd.c:1855:init] 0-management: cannot get run-with-valgrind value
File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 461
print svol_uuid
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(svol_uuid)?
[2024-09-26 04:59:58.398774] E [MSGID: 106229] [glusterd.c:525:glusterd_check_gsync_present] 0-glusterd: geo-replication module not working as desired
[2024-09-26 04:59:58.399350] D [MSGID: 0] [glusterd.c:535:glusterd_check_gsync_present] 0-glusterd: Returning -1
[2024-09-26 04:59:58.399683] E [MSGID: 101019] [xlator.c:503:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2024-09-26 04:59:58.399939] E [MSGID: 101066] [graph.c:327:glusterfs_graph_init] 0-management: initializing translator failed
[2024-09-26 04:59:58.400204] E [MSGID: 101176] [graph.c:698:glusterfs_graph_activate] 0-graph: init failed
[2024-09-26 04:59:58.401297] D [logging.c:1791:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 5 extra log messages
[2024-09-26 04:59:58.401572] D [logging.c:1794:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 5 extra log messages
[2024-09-26 04:59:58.402081] W [glusterfsd.c:1393:cleanup_and_exit] (-->glusterd(glusterfs_volumes_init+0xf7) [0x559b83183ff7] -->glusterd(glusterfs_process_volfp+0x193) [0x559b83183eb3] -->glusterd(cleanup_and_exit+0x5a) [0x559b8318044a] ) 0-: received signum (-1), shutting down
[2024-09-26 04:59:58.402412] D [glusterfsd-mgmt.c:2691:glusterfs_mgmt_pmap_signout] 0-fsd-mgmt: portmapper signout arguments not given
The SyntaxError immediately caught my eye:
[2024-09-26 04:59:58.345626] D [MSGID: 0] [glusterd.c:1855:init] 0-management: cannot get run-with-valgrind value
File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 461
print svol_uuid
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(svol_uuid)?
This must be a Python error! I worked on a couple of Python scripts before - and as I'm not that young anymore also experienced the switch from Python2 to Python3. The most significant syntax change from Python2 to Python3, at least to me, is the way the print function works.
Python2: print "Hello World\n" worked just fine
Python3: print("Hello World\n") requires parentheses
But why would this Python error be shown after the reboot? Has there been a change to the system?
And then I remembered! A few days prior to the system reboot an Ansible playbook (basically our base-configuration playbook, integrating the servers into monitoring, Netbox inventory, etc) was run. And inside this playbook there's a specific task to pin /usr/bin/python to python3 using update-alternatives:
# Set alternatives for /usr/bin/python
- name: BASE - Check if python3 is installed
stat: path=/usr/bin/python3
register: python3
- name: BASE - Set alternatives for /usr/bin/python
alternatives: name=python link=/usr/bin/python path=/usr/bin/python3
when: python3.stat.exists == true and (ansible_distribution_release != "stretch")
This task is only executed if the /usr/bin/python3 binary exists and if the server is not a Debian Stretch (9). The reason is that python2 was the default Python version in Debian 9.
And the very same situation applies to this GlusterFS server, as it turns out to run on Ubuntu 18.04 (Bionic) - which uses python2 by default!
The playbook was adjusted, to not run into this problem in the future anymore (I probably won't as all Ubuntu 18.04 are EOL and must be upgraded), and to ignore "bionic" releases.
To fix the Python version (reverting back to the default Python2) on this GlusterFS node I could follow my own how to point /usr/bin/python to a specific version:
root@glusternode:~# which python
/usr/bin/python
root@glusternode:~# /usr/bin/python -V
Python 3.6.9
root@glusternode:~# update-alternatives --install /usr/bin/python python /usr/bin/python2 1
root@glusternode:~# update-alternatives --list python
/usr/bin/python2
/usr/bin/python3
root@glusternode:~# update-alternatives --remove python /usr/bin/python3
update-alternatives: removing manually selected alternative - switching python to auto mode
update-alternatives: using /usr/bin/python2 to provide /usr/bin/python (python) in auto mode
root@glusternode:~# update-alternatives --list python
/usr/bin/python2
root@glusternode:~# python -V
Python 2.7.17
After the Python version change, the glusterd service could be started again:
root@glusternode:~# systemctl start glusterd
root@glusternode:~# systemctl status glusterd
[...]
Sep 26 09:15:52 glusternode systemd[1]: Starting GlusterFS, a clustered file-system server...
Sep 26 09:15:56 glusternode systemd[1]: Started GlusterFS, a clustered file-system server.
And a few seconds later the GlusterFS volume could be mounted again.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder