How to monitor MySQL or MariaDB Galera Cluster writes and avoid deadlocks

Written by Claudio Kuenzler - 0 comments

Published on May 24th 2019 - last updated on December 22nd 2020 - Listed in Database MySQL Monitoring Galera

check_mysql only checks read operation

To use a simple monitoring of a MySQL server (whether this is MySQL itself, MariaDB, Percona or other fancy forks), the standard plugin is - obviously - check_mysql, which is part of the monitoring-plugins package. However check_mysql only makes a read query as it retrieves some basic statistics from the MySQL server.

To monitor write operations as well (e.g. to catch a full file system) I've been using a simple shell script in the past few years: check_mysql_write.sh .

check_mysql_write

This shell script (from now on mentioned as plugin) is executed from the monitoring server and establishes a remote connection to the given database server:

$ ./check_mysql_write.sh -H dbhost -P port -u dbuser -p dbpass -d database

Note: Of course this can also be executed locally on the DB server.

The plugin relies on a table monitoring in a dedicated database. Here are the statements to prepare this monitoring table:

MariaDB [(none)]> CREATE TABLE mymonitoring.monitoring ( id INT(1), mytime INT(13) );
MariaDB [(none)]> INSERT INTO mymonitoring.monitoring (id, mytime) VALUES (1, 1421421409);

The plugin check_mysql_write would then update the table with the current timestamp, with the following UPDATE query:

UPDATE monitoring SET mytime=$curtime WHERE id=1

This results in the following table with content:

MariaDB [mymonitoring]> select * from monitoring;
+------+------------+
| id | mytime |
+------+------------+
| 1 | 1558676432 |
+------+------------+
1 row in set (0.00 sec)

Unless there was an error during the UPDATE query, the plugin returns OK and - hurray - you have a working MySQL server.

The problem with multi-master nodes

However there's a problem when running multi master nodes, or a Galera cluster. As the plugin is executed on multiple database hosts of the same cluster, this table is the same across the cluster. And because there's only one row with a unique ID (1) in this table, this causes problems.

On a three node Galera cluster with 4 monitoring servers and a check interval of 10 seconds, this single row is updated at least 12 times within 10 seconds. Forcibly the write operations overlap and this creates a deadlock. The UPDATE query would then get the following error:

ERROR 1213 (40001) at line 1: Deadlock found when trying to get lock; try restarting transaction

And the plugin would return "CRITICAL: There was an error trying to write into mymonitoring.monitoring. Do a manual check.".

Adjusting the check_mysql_write plugin

To overcome that deadlock situation, I adapted the structure of the monitoring table and the plugin itself. Instead of using a hard-coded unique ID, the table now has columns host and mytime. The host column is a VARCHAR type which has the values of the hostname on which the plugin runs. The new table is created like this:

MariaDB [(none)]> CREATE TABLE mymonitoring.monitoring ( host VARCHAR(100), mytime INT(13) );

The plugin now tries to update the timestamp of the row where the local hostname matches the hostname in the table:

UPDATE monitoring SET mytime=$curtime WHERE host = '$(hostname)'

Obviously if the plugin runs for the first time, there is no row with the local hostname and the plugin would fail. This is why the plugin itself creates the row at the very first run, if it can't find it in the table:

INSERT INTO monitoring (host, mytime) VALUES ('$(hostname)', $curtime)

Running the plugin on a Galera cluster

Running the plugin is straightforward:

$ /usr/lib/nagios/plugins/check_mysql_write.sh -H galeranode1 -u monitoring -p secret -d mymonitoring
OK: Write query successful (UPDATE monitoring SET mytime=1558683558 WHERE host='icinga1')

Hurray, the write operation was successful. The plugins which run all over different monitoring hosts created the following entries in the monitoring table:

MariaDB [mymonitoring]> select * from monitoring;
+---------------------------+------------+
| host                      | mytime     |
+---------------------------+------------+
| icinga1           | 1558680145 |
| icinga2               | 1558680107 |
| icinga3                 | 1558680109 |
| icinga4                 | 1558680131 |
| haproxy1    | 1558680149 |
| haproxy2    | 1558680148 |
+---------------------------+------------+
6 rows in set (0.00 sec)

The plugin is now public

The plugin check_mysql_write can now be used by everyone interested. You can find it on GitHub: https://github.com/Napsty/check_mysql_write .

Need help in Galera troubleshooting?

Problems in Galera Clusters are not always easy to spot. Need help troubleshooting a Galera cluster? Contact us on Infiniroot.com.

Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

Blog Tags:

AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder