Pages

Thursday, February 3, 2011

Troubleshooting : Mysql replication

One of the most important procedure in setting up master-slave replication on Mysql database successfully is to have a consistent dump or backup of database(s) from the master server by locking all tables during mysqldump so write access is not possible.
master log file and log position will be taken from the master and configure it on the slave server. But it’s not always good times. There are some bad times too. And one of it is sudden stop of your slave and no longer synching with the master database.
I’ll give you a scenario. Before the replication starts, my slave server has an existing database named mifos. Then you setup your replication without any problems and all goes well. Then one time, somebody created a database server named 'mifos' on the master server with a different tables or structure. Then that might confuse now the mysql process. You can see errors like cannot create database etc.
Another common problem also are duplicate entries caused by inconsistent backups that was restored on the slave server.
To check, inspect your mysqld.logs for errors, something like
#less /var/log/mysqld.log
100318  5:34:46 [ERROR] Slave: Error ‘Duplicate entry ‘mifos-2010-01-30′ for key 1′ on query. Default database: ‘mifosdb’. Query: ‘INSERT INTO mifos_loan_schedule(keyword,requested_date,request_count,mifos_id) VALUES (‘mifos’,’2010-01-30”)’, Error_code: 1062
On mysql prompt ,try to check the status of the slave by:
#mysql -u root -p
mysql> show slave status \G;

Slave_IO_Running: Yes
Slave_SQL_Running: No

You will see from that the Slave_SQL_Running is set to No, indicating that the replication is broken.
To repair the replication, here are the steps that you can try:
mysql> SLAVE STOP;
mysql>SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
mysql>SLAVE START;
mysql>SHOW SLAVE STATUS \G

mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.10.10.10
Master_User: slaveuser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mifos-bin.000058
Read_Master_Log_Pos: 470394670
Relay_Log_File: mysqld-relay-bin.000004
Relay_Log_Pos: 123918387
Relay_Master_Log_File:mifos-bin.000058
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 123918248
Relay_Log_Space: 5839136723
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master:0
1 row in set (0.00 sec)

As appeared above, Slave_IO_Running is Yes and Slave_SQL_Running is now set to Yes. Slave State is waiting for master to send Event.
From the mysqld log file, a similar output such as this will appear
100318 23:15:45 [Note] Slave SQL thread initialized, starting replication in log ‘mifos-bin.000058′ at position 603115919, relay log ‘./mysqld-relay-bin.000001′ position: 4
100318 23:15:45 [Note] Slave I/O thread: connected to master
‘slaveuser@10.10.10.10:3306′,  replication started in log ‘mifos-bin.000051′ at position 603115919
If you still have duplicate entries, just rerun the process. But how if are too many and doing all over again is quite stupid already. Then you have an option to put the skip errors to mysql configuration
on /etc/my.cnf
add this line under [mysqld]:
slave-skip-errors=1062
Then restart mysql.

No comments: