For a new with a HA database I decided to create a Galera cluster, as I already installed a couple of Galera clusters so far (see MySQL Galera cluster not starting (failed to open channel). But this time I decided to create a two node cluster with an Arbitrator service for split-brain situations.
The Galera Arbitrator service is a daemon process (garbd) which simply connects to the Galera cluster and is from then on part of the cluster. However there are no databases synced on the disk - it's a pure member, not a data node. That works great for this scenario because we have a dual data center anyway and I don't need three times the same data in two data centers.
I created a config filefor garbd, according to the Galera Arbitrator documentation:
root@garb:~# cat /etc/garbd.conf
# arbtirator.config
group = MYCLUSTER
address = gcomm://10.161.206.45,10.161.206.46
But when I tried to start garbd, it failed:
root@garb:~# garbd --cfg /etc/garbd.conf
2017-03-29 15:47:01.480 INFO: CRC-32C: using hardware acceleration.
2017-03-29 15:47:01.480 INFO: Read config:
daemon: 0
name: garb
address: gcomm://10.161.206.45,10.161.206.46
group: ATLDB
sst: trivial
donor:
options: gcs.fc_limit=9999999; gcs.fc_factor=1.0; gcs.fc_master_slave=yes
cfg: /etc/garbd.conf
log:
I came across a github issue which stated that ports are required:
"garbd" consistently failed to start unless the configuration [...] explicitly provided the port number.
Important here is to note that we're talking about Galera ports, not MySQL/MariaDB ports (3306).
The default Galera port is 4567 and can be verified on one of the Galera data nodes:
root@galera-node1:~# netstat -lntup | grep mysql
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 2971/mysqld
tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN 2971/mysqld
Using the port 4567, I adapted /etc/garbd.conf:
root@garb:~# cat /etc/garbd.conf
# arbtirator.config
group = ATLDB
address = gcomm://10.161.206.45:4567,10.161.206.46:4567
Start test:
root@garb:~# garbd --cfg /etc/garbd.conf
2017-03-29 15:48:31.289 INFO: CRC-32C: using hardware acceleration.
2017-03-29 15:48:31.289 INFO: Read config:
daemon: 0
name: garb
address: gcomm://10.161.206.45:4567,10.161.206.46:4567
group: ATLDB
sst: trivial
donor:
options: gcs.fc_limit=9999999; gcs.fc_factor=1.0; gcs.fc_master_slave=yes
cfg: /etc/garbd.conf
log:
2017-03-29 15:48:31.290 INFO: protonet asio version 0
2017-03-29 15:48:31.290 INFO: Using CRC-32C for message checksums.
2017-03-29 15:48:31.290 INFO: backend: asio
2017-03-29 15:48:31.290 INFO: gcomm thread scheduling priority set to other:0
2017-03-29 15:48:31.290 WARN: access file(./gvwstate.dat) failed(No such file or directory)
2017-03-29 15:48:31.290 INFO: restore pc from disk failed
2017-03-29 15:48:31.291 INFO: GMCast version 0
2017-03-29 15:48:31.291 INFO: (6520b85a, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-03-29 15:48:31.291 INFO: (6520b85a, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-03-29 15:48:31.291 INFO: EVS version 0
2017-03-29 15:48:31.291 INFO: gcomm: connecting to group 'ATLDB', peer '10.161.206.45:4567,10.161.206.46:4567'
2017-03-29 15:48:31.293 INFO: (6520b85a, 'tcp://0.0.0.0:4567') connection established to 6a1ea4ef tcp://10.161.206.45:4567
2017-03-29 15:48:31.293 INFO: (6520b85a, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2017-03-29 15:48:31.296 INFO: (6520b85a, 'tcp://0.0.0.0:4567') connection established to 5d311f46 tcp://10.161.206.46:4567
2017-03-29 15:48:31.585 INFO: declaring 5d311f46 at tcp://10.161.206.46:4567 stable
2017-03-29 15:48:31.585 INFO: declaring 6a1ea4ef at tcp://10.161.206.45:4567 stable
2017-03-29 15:48:31.586 INFO: Node 5d311f46 state prim
2017-03-29 15:48:31.587 INFO: view(view_id(PRIM,5d311f46,5) memb {
5d311f46,0
6520b85a,0
6a1ea4ef,0
} joined {
} left {
} partitioned {
})
2017-03-29 15:48:31.587 INFO: save pc into disk
2017-03-29 15:48:31.792 INFO: gcomm: connected
2017-03-29 15:48:31.792 INFO: Changing maximum packet size to 64500, resulting msg size: 32636
2017-03-29 15:48:31.792 INFO: Shifting CLOSED -> OPEN (TO: 0)
2017-03-29 15:48:31.792 INFO: Opened channel 'ATLDB'
2017-03-29 15:48:31.792 INFO: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2017-03-29 15:48:31.792 INFO: STATE EXCHANGE: Waiting for state UUID.
2017-03-29 15:48:31.792 INFO: STATE EXCHANGE: sent state msg: 64f868a2-1486-11e7-b76e-b64f3ece7e23
2017-03-29 15:48:31.792 INFO: STATE EXCHANGE: got state msg: 64f868a2-1486-11e7-b76e-b64f3ece7e23 from 0 (inf-atldb02-p)
2017-03-29 15:48:31.792 INFO: STATE EXCHANGE: got state msg: 64f868a2-1486-11e7-b76e-b64f3ece7e23 from 2 (inf-atldb01-p)
2017-03-29 15:48:31.793 INFO: STATE EXCHANGE: got state msg: 64f868a2-1486-11e7-b76e-b64f3ece7e23 from 1 (garb)
2017-03-29 15:48:31.793 INFO: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 4,
members = 2/3 (joined/total),
act_id = 0,
last_appl. = -1,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 6a1f102a-13a3-11e7-b710-b2876418a643
2017-03-29 15:48:31.793 INFO: Flow-control interval: [9999999, 9999999]
2017-03-29 15:48:31.793 INFO: Shifting OPEN -> PRIMARY (TO: 0)
2017-03-29 15:48:31.793 INFO: Sending state transfer request: 'trivial', size: 7
2017-03-29 15:48:31.795 INFO: Member 1.0 (garb) requested state transfer from '*any*'. Selected 0.0 (inf-atldb02-p)(SYNCED) as donor.
2017-03-29 15:48:31.795 INFO: Shifting PRIMARY -> JOINER (TO: 0)
2017-03-29 15:48:31.796 INFO: 0.0 (inf-atldb02-p): State transfer to 1.0 (garb) complete.
2017-03-29 15:48:31.796 INFO: 1.0 (garb): State transfer from 0.0 (inf-atldb02-p) complete.
2017-03-29 15:48:31.796 INFO: Shifting JOINER -> JOINED (TO: 0)
2017-03-29 15:48:31.797 INFO: Member 0.0 (inf-atldb02-p) synced with group.
2017-03-29 15:48:31.797 INFO: Member 1.0 (garb) synced with group.
2017-03-29 15:48:31.797 INFO: Shifting JOINED -> SYNCED (TO: 0)
It does indeed look better now! A verification on data node 1 confirmed that the cluster size increased from 2 to 3:
root@galera-node1:~# mysql -B -e "SHOW STATUS WHERE variable_name ='wsrep_local_state_comment' \
OR variable_name ='wsrep_cluster_size' \
OR variable_name ='wsrep_incoming_addresses' \
OR variable_name ='wsrep_cluster_status' \
OR variable_name ='wsrep_connected' \
OR variable_name ='wsrep_ready' \
OR variable_name ='wsrep_local_state_uuid' \
OR variable_name ='wsrep_cluster_state_uuid';"
Variable_name Value
wsrep_cluster_size 3
wsrep_cluster_state_uuid 6a1f102a-13a3-11e7-b710-b2876418a643
wsrep_cluster_status Primary
wsrep_connected ON
wsrep_incoming_addresses ,10.161.206.46:3306,10.161.206.45:3306
wsrep_local_state_comment Synced
wsrep_local_state_uuid 6a1f102a-13a3-11e7-b710-b2876418a643
wsrep_ready ON
Note that the garbd machine doesn't show up in the row "wsrep_incoming_addresses". It's merely showing up "empty" (note the comma). That makes sense, because there is no MySQL running on the Arbitrator Service machine, ergo no 3306 listener.
Problems in Galera Clusters are not always easy to spot. Need help troubleshooting a Galera cluster? Contact us on Infiniroot.com.
chris from wrote on Feb 20th, 2023:
truly life saving article. thanks alot for sharing this. it ended up my troubleshooting after 8 hours !
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder