One important, but very easily forgotten, fact about Galera clusters is the need of a PK (Primary Key) in the tables. Yes, every table. Galera uses the PK as unique identifier which row has changed in a table.
Using InnoDB tables without a PK might still work - but in the worst case they can break your cluster.
Image courtesy: Dolphin, Seal, Key
After a data node needed to be reset, it was added back to the cluster. With mariabackup as wsrep sst method, the sync should be fast and painless - and most importantly without downtime (the donor is not locked).
However just a few seconds after mariadb was started, the error log contained the following crash:
2021-05-01 17:06:04 0 [Note] WSREP: ####### Adjusting cert position: 420 -> 421
2021-05-01 17:06:04 0 [Note] WSREP: Service thread queue flushed.
2021-05-01 17:06:04 0 [Note] WSREP: Lowest cert index boundary for CC from preload: 137
2021-05-01 17:06:04 0 [Note] WSREP: Min available from gcache for CC from preload: 1
2021-05-01 17:06:04 21 [ERROR] InnoDB: Conflicting lock on table: `monitoring`.`monitoring` index: GEN_CLUST_INDEX that has lock
RECORD LOCKS space id 8 page no 3 n bits 72 index GEN_CLUST_INDEX of table `monitoring`.`monitoring` trx id 989 lock_mode X locks rec but not gap
Record lock, heap no 2
Record lock, heap no 3
Record lock, heap no 4
2021-05-01 17:06:04 21 [ERROR] InnoDB: WSREP state:
2021-05-01 17:06:04 21 [ERROR] WSREP: Thread BF trx_id: 990 thread: 21 seqno: 423 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: UPDATE monitoring SET mytime=1619880345 WHERE host = 'galera-db02-p'?i?
2021-05-01 17:06:04 21 [ERROR] WSREP: Thread BF trx_id: 989 thread: 10 seqno: 422 client_state: exec client_mode: high priority transaction_mode: executing applier: 1 toi: 0 local: 0 query: UPDATE monitoring SET mytime=1619880345 WHERE host = 'galera-db03-p'?i?
2021-05-01 17:06:04 0x7f199470a700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.4.16/storage/innobase/lock/lock0lock.cc line 687
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
210501 17:06:04 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.4.16-MariaDB-1:10.4.16+maria~buster
key_buffer_size=134217728
read_buffer_size=8388608
max_used_connections=0
max_threads=502
thread_count=23
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 12480612 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7f1910000c08
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f1994709e08 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55d79898dcfe]
/usr/sbin/mysqld(handle_fatal_signal+0x54d)[0x55d79848205d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f1aa37c5730]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7f1aa2c0d7bb]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7f1aa2bf8535]
/usr/sbin/mysqld(+0x5ad6a2)[0x55d7981786a2]
/usr/sbin/mysqld(+0x58c2bf)[0x55d7981572bf]
/usr/sbin/mysqld(+0xaa1256)[0x55d79866c256]
/usr/sbin/mysqld(+0xaa5e22)[0x55d798670e22]
/usr/sbin/mysqld(+0xb2aace)[0x55d7986f5ace]
/usr/sbin/mysqld(+0xb30c41)[0x55d7986fbc41]
/usr/sbin/mysqld(+0xa60f52)[0x55d79862bf52]
/usr/sbin/mysqld(+0xa61283)[0x55d79862c283]
/usr/sbin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x12f)[0x55d798487a7f]
/usr/sbin/mysqld(_ZN14Rows_log_event8find_rowEP14rpl_group_info+0x586)[0x55d798581946]
/usr/sbin/mysqld(_ZN21Update_rows_log_event11do_exec_rowEP14rpl_group_info+0xe0)[0x55d798581e40]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x23c)[0x55d79857535c]
/usr/sbin/mysqld(_Z18wsrep_apply_eventsP3THDP14Relay_log_infoPKvm+0x1c4)[0x55d7983fe7b4]
/usr/sbin/mysqld(_ZN21Wsrep_applier_service15apply_write_setERKN5wsrep7ws_metaERKNS0_12const_bufferERNS0_14mutable_bufferE+0x9c)[0x55d7983e7f1c]
/usr/sbin/mysqld(+0xe34da2)[0x55d7989ffda2]
/usr/sbin/mysqld(_ZN5wsrep12server_state8on_applyERNS_21high_priority_serviceERKNS_9ws_handleERKNS_7ws_metaERKNS_12const_bufferE+0x15d)[0x55d798a00c3d]
/usr/sbin/mysqld(+0xe49214)[0x55d798a14214]
/usr/lib/libgalera_smm.so(+0x535fb)[0x7f1aa0ffd5fb]
/usr/lib/libgalera_smm.so(+0x63e70)[0x7f1aa100de70]
/usr/lib/libgalera_smm.so(+0x77e42)[0x7f1aa1021e42]
/usr/lib/libgalera_smm.so(+0x78396)[0x7f1aa1022396]
/usr/lib/libgalera_smm.so(+0x66d53)[0x7f1aa1010d53]
/usr/lib/libgalera_smm.so(+0x4248b)[0x7f1aa0fec48b]
/usr/sbin/mysqld(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0xe)[0x55d798a148de]
/usr/sbin/mysqld(+0x83573f)[0x55d79840073f]
/usr/sbin/mysqld(_Z15start_wsrep_THDPv+0x388)[0x55d7983f2a28]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f1aa37bafa3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f1aa2ccf4cf]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f1a1c061dab): UPDATE monitoring SET mytime=1619880345 WHERE host = 'galera-db02-p'
Connection ID (thread ID): 21
Status: NOT_KILLED
Yes, sure, that's one big error logged, but the causing error mentions a lock on the table monitoring in the monitoring database:
2021-05-01 17:06:04 21 [ERROR] InnoDB: Conflicting lock on
table: `monitoring`.`monitoring` index: GEN_CLUST_INDEX that has lock
RECORD
LOCKS space id 8 page no 3 n bits 72 index GEN_CLUST_INDEX of table
`monitoring`.`monitoring` trx id 989 lock_mode X locks rec but not gap
Research of this error leads to a (once more) excellent article from the DBAs at severalnines. In this article the author, Krzystof Ksiazek, explains what the GEN_CLUST_INDEX log event actually means:
InnoDB requires a PK to be available - it uses PK as a clustered index, which is the main component of a table structure. No PK - no way to organize data within a table. If there’s no explicit PK defined on a table, InnoDB looks for UNIQUE keys to use for clustered index. If there’s no PK nor UNIQUE key available, an implicit PK is created - it’s referred to by InnoDB as ‘GEN_CLUST_INDEX’. If you see the entry like above, you can be sure that this particular table (db.table) doesn’t have a PK defined. - Krzystof Ksiazek
The table monitoring.monitoring is actually used and frequently updated by the monitoring plugin check_mysql_write. Its purpose is to verify write operations on a MySQL/MariaDB server. And because the author of this article is also the author of the mentioned monitoring plugin, I know for a fact that this monitoring table does not have a PK.
Another fun fact needed? Two years ago, in May 2019, I wrote an article how to monitor MySQL or MariaDB Galera Cluster writes and avoid deadlocks. Back then the plugin actually used a row identifer (id):
MariaDB [mymonitoring]> select * from monitoring;
+------+------------+
| id | mytime |
+------+------------+
| 1 | 1558676432 |
+------+------------+
1 row in set (0.00 sec)
However with the adjustments on the plugin's table schema in May 2019 this column "id" disappeared and was replaced by a column "host":
MariaDB [mymonitoring]> select * from monitoring;
+---------------------------+------------+
| host | mytime |
+---------------------------+------------+
| icinga1 | 1558680145 |
| icinga2 | 1558680107 |
| icinga3 | 1558680109 |
| icinga4 | 1558680131 |
| haproxy1 | 1558680149 |
| haproxy2 | 1558680148 |
+---------------------------+------------+
6 rows in set (0.00 sec)
This change was done to circumvent the Galera lock of the table - however the table lost its "id" column and PK. Big mistake.
Note: Although this monitoring table has been in use in multiple Galera clusters for the last years, a sync problem actually never arose - until this week with MariaDB 10.4.
To fix this problem, the structure of the table "monitoring" needs to be adjusted. The "id" column comes back as Primary Key of the table:
MariaDB [monitoring]> DROP TABLE monitoring.monitoring;
MariaDB [monitoring]> CREATE TABLE monitoring.monitoring ( id INT(3) NOT NULL AUTO_INCREMENT, host VARCHAR(100), mytime INT(13), PRIMARY KEY (id) );
On the plugin/script side, nothing needed to be adjusted, as the value of "id" is automatically increased thanks to AUTO_INCREMENT.
After this change was applied, the cluster was back in sync within a couple of minutes again.
The plugin's help and documentation were adjusted to contain the new CREATE TABLE command.
The lesson learned is clear: Check every table in a Galera cluster and verify that a PK exists. If not, add one (shoudn't be too hard). And if for some reason PK's don't work, question your own choice of Galera. Maybe you're better off using a classical replication in this case?
The MariaDB documentation mentions the need (or strong suggestion) to use PK in the known limitations of a Galera cluster:
All tables should have a primary key (multi-column primary keys are supported). DELETE operations are unsupported on tables without a primary key. Also, rows in tables without a primary key may appear in a different order on different nodes.
Problems in Galera Clusters are not always easy to spot. Need help troubleshooting a Galera cluster? Contact us on Infiniroot.com.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder