How to monitor MySQL or MariaDB Galera Cluster writes and avoid deadlocks

Written by - 0 comments

Published on - last updated on December 22nd 2020 - Listed in Database MySQL Monitoring Galera


check_mysql only checks read operation

To use a simple monitoring of a MySQL server (whether this is MySQL itself, MariaDB, Percona or other fancy forks), the standard plugin is - obviously - check_mysql, which is part of the monitoring-plugins package. However check_mysql only makes a read query as it retrieves some basic statistics from the MySQL server.

To monitor write operations as well (e.g. to catch a full file system) I've been using a simple shell script in the past few years: check_mysql_write.sh .

check_mysql_write

This shell script (from now on mentioned as plugin) is executed from the monitoring server and establishes a remote connection to the given database server:

$ ./check_mysql_write.sh -H dbhost -P port -u dbuser -p dbpass -d database

Note: Of course this can also be executed locally on the DB server.

The plugin relies on a table monitoring in a dedicated database. Here are the statements to prepare this monitoring table:

MariaDB [(none)]> CREATE TABLE mymonitoring.monitoring ( id INT(1), mytime INT(13) );
MariaDB [(none)]> INSERT INTO mymonitoring.monitoring (id, mytime) VALUES (1, 1421421409);

The plugin check_mysql_write would then update the table with the current timestamp, with the following UPDATE query:

UPDATE monitoring SET mytime=$curtime WHERE id=1

This results in the following table with content:

MariaDB [mymonitoring]> select * from monitoring;
+------+------------+
| id   | mytime     |
+------+------------+
|    1 | 1558676432 |
+------+------------+
1 row in set (0.00 sec)

Unless there was an error during the UPDATE query, the plugin returns OK and - hurray - you have a working MySQL server.

The problem with multi-master nodes

However there's a problem when running multi master nodes, or a Galera cluster. As the plugin is executed on multiple database hosts of the same cluster, this table is the same across the cluster. And because there's only one row with a unique ID (1) in this table, this causes problems.

On a three node Galera cluster with 4 monitoring servers and a check interval of 10 seconds, this single row is updated at least 12 times within 10 seconds. Forcibly the write operations overlap and this creates a deadlock. The UPDATE query would then get the following error:

ERROR 1213 (40001) at line 1: Deadlock found when trying to get lock; try restarting transaction

And the plugin would return "CRITICAL: There was an error trying to write into mymonitoring.monitoring.  Do a manual check.".

Adjusting the check_mysql_write plugin

To overcome that deadlock situation, I adapted the structure of the monitoring table and the plugin itself. Instead of using a hard-coded unique ID, the table now has columns host and mytime. The host column is a VARCHAR type which has the values of the hostname on which the plugin runs. The new table is created like this:

MariaDB [(none)]> CREATE TABLE mymonitoring.monitoring ( host VARCHAR(100), mytime INT(13) );

The plugin now tries to update the timestamp of the row where the local hostname matches the hostname in the table:

UPDATE monitoring SET mytime=$curtime WHERE host = '$(hostname)'

Obviously if the plugin runs for the first time, there is no row with the local hostname and the plugin would fail. This is why the plugin itself creates the row at the very first run, if it can't find it in the table:

INSERT INTO monitoring (host, mytime) VALUES ('$(hostname)', $curtime)

Running the plugin on a Galera cluster

Running the plugin is straightforward:

$ /usr/lib/nagios/plugins/check_mysql_write.sh -H galeranode1 -u monitoring -p secret -d mymonitoring
OK: Write query successful (UPDATE monitoring SET mytime=1558683558 WHERE host='icinga1')

Hurray, the write operation was successful. The plugins which run all over different monitoring hosts created the following entries in the monitoring table:

MariaDB [mymonitoring]> select * from monitoring;
+---------------------------+------------+
| host                      | mytime     |
+---------------------------+------------+
| icinga1                   | 1558680145 |
| icinga2                   | 1558680107 |
| icinga3                   | 1558680109 |
| icinga4                   | 1558680131 |
| haproxy1                  | 1558680149 |
| haproxy2                  | 1558680148 |
+---------------------------+------------+
6 rows in set (0.00 sec)

The plugin is now public

The plugin check_mysql_write can now be used by everyone interested. You can find it on GitHub: https://github.com/Napsty/check_mysql_write .

Need help in Galera troubleshooting?

Problems in Galera Clusters are not always easy to spot. Need help troubleshooting a Galera cluster? Contact us on Infiniroot.com.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder