MySQL (Percona) replication stopped due to corrupt innodb ibd file

Written by - 0 comments

Published on - Listed in MySQL Solaris SmartOS Unix Database


I had to solve a strange MySQL replication issue today. All of a sudden, the Slave_SQL_Running changed to No and showed the following error message:

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: XX.XX.XX.XX
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin.000481
          Read_Master_Log_Pos: 344928978
               Relay_Log_File: relay-log.001382
                Relay_Log_Pos: 293565267
        Relay_Master_Log_File: bin.000481
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1146
                   Last_Error: Error 'Table 'mydb.mytable_temp' doesn't exist' on query. Default database: 'mydb'. Query:
'CREATE TABLE `mytable_temp` (
[...]

After some investigation on the slave server it turned out that there was already a table called mydb.mytable_temp. Therefore the CREATE TABLE returned an error.

As a first step I disabled the SLAVE process and skipped the query causing the error (the mentioned CREATE TABLE query). Then I started the SLAVE again to see what happens afterwards:

mysql> STOP SLAVE;
mysql> SET GLOBAL sql_slave_skip_counter = 1;
mysql> START SLAVE;

As soon as the slave was started again, the next query was again resulting in an error, because it was an INSERT INTO mytable_temp. Although the table already existed (seen in 'show tables') the INSERT didn't work. 

I decided to manually drop the table mytable_temp and re-create it with the query from the previous slave error, but funnily the table couldn't be dropped:

mysql> use mydb;
mysql> DROP TABLE mytable_temp;
ERROR 1051 (42S02) at line 1: Unknown table 'mytable_temp'

Well... that is strange. But although an error appeared right after the DROP, a new 'show tables' didn't show mytable_temp anymore. So it's gone. Right?

Now that the table seems to be gone, nevermind the error before, I wanted to create the table manually with the CREATE TABLE query seen in the first slave error:

mysql> CREATE TABLE mytable_temp (
    -> `entry_id` INT(11) NOT NULL,
[...]
    -> key text_index(data_text(4096))
    -> );

ERROR 1005 (HY000): Can't create table 'mydb.mytable_temp' (errno: -1)

What the hell?! I checked out the mysql error log and found the following entry:

# tail error.log
InnoDB: Error number 17 means 'File exists'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/operating-system-error-codes.html
InnoDB: The file already exists though the corresponding table did not
InnoDB: exist in the InnoDB data dictionary. Have you moved InnoDB
InnoDB: .ibd files around without using the SQL commands
InnoDB: DISCARD TABLESPACE and IMPORT TABLESPACE, or did
InnoDB: mysqld crash in the middle of CREATE TABLE? You can
InnoDB: resolve the problem by removing the file './mydb/mytable_temp.ibd'
InnoDB: under the 'datadir' of MySQL.

Oh wow... Incredible! There seems to be a problem with the ibd file - which exists although the table wasn't created successfully. The suggestion in the error log is to remove that particular file:

# rm /var/lib/mysql/mydb/mydb_temp.ibd

Right after that I relaunched the CREATE TABLE query and this time ... it worked! (Well no surprise given the new information)

All what was left to do was to finally activate the slave again:

mysql> START SLAVE;

... and the slave was catching up just fine by doing all the INSERTs into that temp table.

One question remains however: What's the reason for this "corrupted" ibd file? Did MySQL try to create the table but had a process crash in the middle of it? Was this table already there from an earlier manual replication sync? Had the file system an issue?

According to the mysql error log, the MySQL slave server seems to have crashed just before or during that operation:

130618 14:01:48  InnoDB: Assertion failure in thread 14 in file buf0buf.c line 4166
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
12:01:48 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=16777216
read_buffer_size=1048576
max_used_connections=3
max_threads=5000
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 10316071 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
/mysql/bin/mysqld'my_print_stacktrace+0x1e [0x8d214e]
/mysql/bin/mysqld'handle_fatal_signal+0x359 [0x7b7b79]
/lib/amd64/libc.so.1'__sighndlr+0x6 [0xfffffd7fff2412b6]
/lib/amd64/libc.so.1'call_user_handler+0x1c3 [0xfffffd7fff234db3]
/lib/amd64/libc.so.1'_lwp_kill+0xa [0xfffffd7fff247e3a] [Signal 6 (ABRT)]
/lib/amd64/libc.so.1'raise+0x19 [0xfffffd7fff1e6d79]
/lib/amd64/libc.so.1'abort+0x5d [0xfffffd7fff1c3a2d]
/mysql/bin/mysqld'buf_page_io_complete+0x41f [0x970f8f]
/mysql/bin/mysqld'fil_aio_wait+0x144 [0x9b12e4]
/mysql/bin/mysqld'io_handler_thread+0x48 [0x921538]
/lib/amd64/libc.so.1'_thrp_setup+0x83 [0xfffffd7fff240ef3]
/lib/amd64/libc.so.1'_lwp_start+0x0 [0xfffffd7fff2411f0]
Please read http://dev.mysql.com/doc/refman/5.1/en/resolve-stack-dump.html
and follow instructions on how to resolve the stack trace.
Resolved stack trace is much more helpful in diagnosing the
problem, so please do resolve it
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
130618 14:02:08 mysqld_safe mysqld restarted
130618 14:02:08 [Note] Plugin 'FEDERATED' is disabled.
130618 14:02:08 InnoDB: The InnoDB memory heap is disabled
130618 14:02:08 InnoDB: Mutexes and rw_locks use GCC atomic builtins
130618 14:02:08 InnoDB: Compressed tables use zlib 1.2.3
130618 14:02:08 InnoDB: Initializing buffer pool, size = 32.0G
130618 14:02:15 InnoDB: Completed initialization of buffer pool
130618 14:02:15 InnoDB: highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 777018804236
130618 14:02:16  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 777024047104
InnoDB: Doing recovery: scanned up to log sequence number 777029289984
InnoDB: Doing recovery: scanned up to log sequence number 777034532864
InnoDB: Doing recovery: scanned up to log sequence number 777039775744
InnoDB: Doing recovery: scanned up to log sequence number 777042170082
130618 14:02:21  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 293564055, file name bin.000481
InnoDB: and relay log file
InnoDB: position 0 293564195, file name /var/mysql/logs/relay-log.001382
InnoDB: Last MySQL binlog file position 0 107087858, file name /var/mysql/logs/bin.000021
130618 14:02:24 InnoDB: Restoring buffer pool pages from ib_lru_dump
130618 14:02:24  InnoDB: Waiting for the background threads to start
130618 14:02:25 Percona XtraDB (http://www.percona.com) 1.1.8-27.0 started; log sequence number 777042170082
handlersocket: not listening for reads
handlersocket: not listening for writes
handlersocket: initialized
130618 14:02:25 [Note] Recovering after a crash using /var/mysql/logs/bin
130618 14:02:25 [Note] Starting crash recovery...
130618 14:02:25 [Note] Crash recovery finished.
130618 14:02:25 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
130618 14:02:25 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
130618 14:02:25 [Note] Server socket created on IP: '0.0.0.0'.
130618 14:02:26 [Note] Slave SQL thread initialized, starting replication in log 'bin.000481' at position 293565127, relay log '/var/mysql/logs/relay-log.001382' position: 293565267
130618 14:02:26 [Note] Slave I/O thread: connected to master 'repl@xx.xx.xx.xx:3306',replication started in log 'bin.000481' at position 293648217
130618 14:02:26 [ERROR] Cannot find or open table mydb/mytable_temp from
the internal data dictionary of InnoDB though the .frm file for the
table exists. Maybe you have deleted and recreated InnoDB data
files but have forgotten to delete the corresponding .frm files
of InnoDB tables, or you have moved .frm files to another database?
or, the table contains indexes that this version of the engine
doesn't support.
See http://dev.mysql.com/doc/refman/5.5/en/innodb-troubleshooting.html
how you can resolve the problem.

So this is confirmed. The corruption was caused by a crashed MySQL (Percona) process.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder