After data corruption happened on a MySQL replication replica (slave) server, the databases needed to be fully synced again. In order to do this, the following steps are required:
Note: The old replication terms were "master-slave replication", consisting of a master and one or more slave servers. The new terminology uses "source-replica replication", consisting of a source and one or more replica servers.
The information, that the replica was out of sync, was retrieved by our monitoring, using the moniotring plugin check_mysql_slavestatus in the background:
***** Icinga *****
Notification Type: PROBLEM
Service: MySQL Replication Status
Host: mysql02
Address: 10.10.50.49
State: CRITICAL
Date/Time: 2021-04-22 07:54:34 +0200
Additional Info: CRITICAL: -h 10.10.50.49:-P 3306 Slave_SQL_Running: No
Comment: []
After an analysis what happened, it was decided to "clear" the replica and do a full data-sync from the source.
Note: A very good guide can be found on Remi Bergsma's article fixing an out-of-sync or corrupt MySQL slave.
On the source server, the replication state is reset and the tables must be locked to get a consistent state.
mysql> RESET MASTER;
Query OK, 0 rows affected (0.52 sec)
mysql> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.02 sec)
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 154 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
Save the information retrieved from the SHOW MASTER STATUS command. You'll need this later.
Then the database(s) can be saved using mysqldump:
root@mysql01:/backup# mysqldump --max_allowed_packet=512M --all-databases > mysqldump.sql
If a database contains large data fields, the --max_allowed_packet parameter should be used.
The dump should contain the CREATE DATABASE command:
root@mysql01:/backup# more mysqldump.sql
-- MySQL dump 10.13 Distrib 5.7.31, for Linux (x86_64)
--
-- Host: localhost Database:
-- ------------------------------------------------------
-- Server version 5.7.31-log
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
--
-- Current Database: `Jira`
--
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `Jira` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin */;
Now that the dump was successfully written, the tables can be unlocked:
mysql> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)
Now transfer the dump file to the replica server. If the dump is big, you should consider zipping it first:
root@mysql01:/backup# gzip mysqldump.sql
Now with the dump transferred to the replica server, the slave process needs to be stopped and reset:
mysql> STOP SLAVE;
Query OK, 0 rows affected (0.00 sec)
mysql> RESET SLAVE;
Query OK, 0 rows affected (0.00 sec)
To get a clean state and get rid of still remaining data corruption, I recommend to DROP all databases (except mysql internal databases information_schema, mysql, performance_schema and sys).
mysql> DROP DATABASE Jira;
Query OK, 0 rows affected (0.00 sec)
The database dump can now be loaded into the MySQL server:
root@mysql02:/backup# gunzip < mysqldump.sql.gz | mysql --max_allowed_packet=512M -u root
Once the dump was restored (without error), the replication settings need to be adjusted. Use the information retrieved from the master earlier:
mysql> CHANGE MASTER TO master_host='mysql01', master_user='repl', master_password='secret', master_log_file='mysql-bin.000001', master_log_pos=154;
Query OK, 0 rows affected, 2 warnings (0.02 sec)
mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)
In most cases, the replication should catch up since the master's database was unlocked.
In some cases the replication can have a couple of hiccups. In this case an error 1032 showed up in the SHOW SLAVE STATUS command:
mysql> SHOW SLAVE STATUS\G;
*************************** 1. row ***************************
Slave_IO_State: Queueing master event to the relay log
Master_Host: mysql01
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 40254923
Relay_Log_File: mysql02-relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Could not execute Update_rows event on table monitoring.monitoring; Can't find record in 'monitoring', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log mysql-bin.000001, end_log_pos 441
Skip_Counter: 0
Exec_Master_Log_Pos: 154
Relay_Log_Space: 40255304
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table monitoring.monitoring; Can't find record in 'monitoring', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log mysql-bin.000001, end_log_pos 441
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: e1e2a203-eb81-11ea-8f82-0050568d9dca
Master_Info_File: /var/lib/mysql/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 210423 07:27:58
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
If you search for that particular error, most results will simply tell you to skip the error and start the slave again. Although this is technically correct, you should first make sure that you understand the SQL error. In this case the error is caused by yet another monitoring plugin check_mysql_write, which updates a record in the monitoring table in the monitoring database every 10 seconds.
As this is only monitoring related and temporary data, this error can be skipped and the slave process started again:
mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)
But as I mentioned, check_mysql_write updates the table every 10 seconds, so there are a ton of these errors to skip. This can be solved doing a loop, by making sure the error is related to the monitoring database:
root@mysql02:~# while true; do if [[ $(mysql -e "SHOW SLAVE STATUS\G" | grep "Last_SQL_Error:" | grep -c "monitoring") -gt 0 ]]; then mysql -e "STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;"; else break; fi; sleep 1; done
This took several seconds until finally all errors related to the monitoring database were skipped and finally the replication was running again.
***** Icinga *****
Notification Type: RECOVERY
Service: MySQL Replication Status
Host: mysql02
Address: 10.10.50.49
State: OK
Date/Time: 2021-04-23 07:37:48 +0200
Additional Info: OK: Slave SQL running: Yes Slave IO running: Yes / master: mysql01 / slave is 0 seconds behind master
Comment: []
Certain databases don't need to be replicated. Such databases can be excluded from the replication by using the binlog_ignore_db configuration parameter. The monitoring database is a perfect example:
root@mysql01:~# grep "binlog_ignore_db" /etc/mysql/mysql.conf.d/mysqld.cnf
binlog_ignore_db = monitoring
Multiple databases can be specified by defining binlog_ignore_db several times:
root@mysql01:~# grep "binlog_ignore_db" /etc/mysql/mysql.conf.d/mysqld.cnf
binlog_ignore_db = monitoring
binlog_ignore_db = anotherdatabase
Note: The other way around (do not replicate anything, except this database) is also possible! By using the binlog_do_db parameter.
Amsvrid from France wrote on Jun 4th, 2021:
Thank you for the solution, this helped me :)
Claudio Kuenzler from Switzerland wrote on Jun 1st, 2021:
saurabh, there is no MariaDB 10.23 version. The newest as of today is 10.5. As mentioned before, you need to adjust your query to SHOW ALL SLAVES STATUS. See MariaDB documentation for more.
saurabh from wrote on Jun 1st, 2021:
Dear Sir,
I am using mariadb 10.23 version.How to use this command in terminal.Please guide me.
# while true; do if [[ $(mysql -e "SHOW SLAVE STATUS\G" | grep "Last_SQL_Error:" | grep -c "monitoring") -gt 0 ]]; then mysql -e "STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;"; else break; fi; sleep 1; done
Claudio Kuenzler from Switzerland wrote on May 31st, 2021:
saurabh, it looks like you are using a more frequent MariaDB version so SHOW SLAVE STATUS will not work. See the article MySQL and MariaDB multi source replication and how to monitor them with check_mysql_slavestatus for more information. To make it short, you need to adjust your commands to your MariaDB version, which would most likely be SHOW ALL SLAVES STATUS.
saurabh from wrote on May 29th, 2021:
dear sir i have maria db installed at starting all my replication work perfectly but after audit this error shows, i tried many process but still get error , i saw your blog post i think it will work for me but i don't know how to use that command i simply paste all (and shows error that i pasted below)..sir please help me how to use that command.
Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows_v1 event on table template.ap_status; Can't find record in 'ap_status', Error_code: 1032; handle
r error HA_ERR_KEY_NOT_FOUND; the event's master log masterdb-bin.000008, end_log_pos 3263585
when i enter while loop command this error show below
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '[
[ $(mysql -e "SHOW SLAVE STATUS\G" | grep "Last_SQL_Error:" | grep -c "ap_st...' at line 1
MariaDB [(none)]>
Claudio Kuenzler from Switzerland wrote on May 29th, 2021:
saurabh, you need the mysql command. Usually this command comes from the mysql-client (or mariadb-client) package.
saurabh from wrote on May 29th, 2021:
sir as you mentioned in this post please guide me how to run this query i mentioned below .. i am getting-no command found ,please guide me sir
root@mysql02:~# while true; do if [[ $(mysql -e "SHOW SLAVE STATUS\G" | grep "Last_SQL_Error:" | grep -c "monitoring") -gt 0 ]]; then mysql -e "STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;"; else break; fi; sleep 1; done
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder