MySQL replication stopped due to a disk full situation on MySQL server. Here are notes on how I repaired it.
Due to /opt being full on DB master, the replication crashed. Symptoms, on DB slave:
PCUDSSW1513> sudo tail -100 /var/log/mysqld.log 120524 9:43:48 [ERROR] Got fatal error 1236: 'binlog truncated in the middle of event' from master when reading data from binary log 120524 9:43:48 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000063', position 399760643
Checked the given binary log file on DB master:
PCUDSSX1501 2 /opt/mysql-data> sudo -u mysql mysqlbinlog --start-position 399760643 'mysql-bin.000063' /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 14597, event_type: 2 Could not read entry at offset 399760643:Error in log format or read error DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
Similar troubles for some positions before that. So looked at the whole file:
PCUDSSX1501 2 /opt/mysql-data> sudo -u mysql mysqlbinlog 'mysql-bin.000063' > /opt/simko/zzz.sql ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 14597, event_type: 2 Could not read entry at offset 399760643:Error in log format or read error
and:
PCUDSSX1501 2 /opt/mysql-data> tail -10 /opt/simko/zzz.sql /*!*/; # at 399760393 #120523 22:22:40 server id 201101501 end_log_pos 399760643 Query thread_id=77556 exec_time=0 error_code=0 SET TIMESTAMP=1337804560/*!*/; INSERT INTO rnkDOWNLOADS (id_bibrec,id_bibdoc,file_version,file_format,id_user,client_host,download_time) VALUES (1157741,94200,'','GIF;ICON',0,INET_ATON('24.222.171.86'),NOW()) /*!*/; DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
Checked whether this data exists on DB slave:
mysql> SELECT * FROM rnkDOWNLOADS WHERE id_bibrec=1157741 AND id_bibdoc=94200 AND client_host=INET_ATON('24.222.171.86'); +-----------+---------------------+-------------+---------+-----------+--------------+-------------+---------+------------------+ | id_bibrec | download_time | client_host | id_user | id_bibdoc | file_version | file_format | referer | display_position | +-----------+---------------------+-------------+---------+-----------+--------------+-------------+---------+------------------+ | 1157741 | 2012-05-23 22:22:40 | 417246038 | 0 | 94200 | 0 | GIF;ICON | NULL | 0 | +-----------+---------------------+-------------+---------+-----------+--------------+-------------+---------+------------------+ 1 row in set (0.87 sec)
Seems it does.
Indeed, comparing end_log_pos value of 399760643 in zzz.sql file above exactly corresponds to DB slave's position of 399760643:
mysql> SHOW SLAVE STATUS\G; *************************** 1. row *************************** Slave_IO_State: Master_Host: 137.138.198.204 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000063 Read_Master_Log_Pos: 399760643 Relay_Log_File: mysqld-relay-bin.000191 Relay_Log_Pos: 235 Relay_Master_Log_File: mysql-bin.000063 Exec_Master_Log_Pos: 399760643
Hence it seems that everything that DB master had in the binary log file before the crash was correctly replicated on the DB slave.
Consequently let's try to start replication anew by forcing the next available position. Not the best, but may work. (See also P.S. below.)
The start of the next DB master binary log file looks like:
(sudo -u mysql mysqlbinlog 'mysql-bin.000064' | head -15) /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #120523 23:54:51 server id 201101501 end_log_pos 98 Start: binlog v 4, server v 5.0.95-log created 120523 23:54:51 at startup ROLLBACK/*!*/; # at 98 #120523 23:54:53 server id 201101501 end_log_pos 345 Query thread_id=5 exec_time=0 error_code=0 use cdsweb/*!*/; SET TIMESTAMP=1337810093/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=1, @@session.unique_checks=1/*!*/; SET @@session.sql_mode=0/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/; SET @@session.time_zone='SYSTEM'/*!*/;
Let's try to restart DB replication from file 64 position 98 by executing (on DB slave):
mysql> STOP SLAVE; mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000064',MASTER_LOG_POS=98; mysql> START SLAVE;
This restarted the replication.
Since almost a week elapsed since DB master crashed, the DB slave is quite behind:
mysql> SHOW SLAVE STATUS\G; Seconds_Behind_Master: 560616
But the replication is running again, catching up with the past. Let's see how much time it will take for the slave to catch up.
P.S. For additional safety, one may not rely on such truncated binary log file analysis, but rather restart DB replication from scratch anew, when service time permits.