在高并发网站架构中,MySQL数据库主从同步是不可或缺的,不过经常会发生由于网络原因或者操作错误,MySQL主从经常会出现不同步的情况,那么如何监控MySQL主从同步,也变成网站正常运行的重要环节。
MySQL同步功能由3个线程(master上1个,slave上2个)来实现,简单的说就是:master发送日志一个,slave接收日志一个,slave运行日志一个。
首先,我们解释一下 show slave status 中重要的几个参数:
Slave_IO_Running: I/O线程是否被启动并成功地连接到主服务器上。
Slave_SQL_Running: SQL线程是否被启动。
Seconds_Behind_Master:和主库比同步延迟的秒数
本字段是从属服务器“落后”多少的一个指示。当从属SQL线程正在运行时(处理更新),本字段为在主服务器上由此线程执行的最近的一个事件的时间标记开始,已经过的秒数。当此线程被从属服务器I/O线程赶上,并进入闲置状态,等待来自I/O线程的更多的事件时,本字段为零。总之,本字段测量从属服务器SQL线程和从属服务器I/O线程之间的时间差距,单位以秒计。
如何监控从服务器是否正常运行呢?
[root@slave ~]# mysql -u root -proot -e "show slave status\G;" *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.31 #当前的mysql master服务器主机 Master_User: myslave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 471 Relay_Log_File: relay-log-bin.000002 Relay_Log_Pos: 252 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: Yes #I/O线程是否被启动并成功地连接到主服务器上。 Slave_SQL_Running: Yes #SQL线程是否被启动。 Master_SSL_Key: Seconds_Behind_Master: 0 #和主库比同步延迟的秒数
下面编写shell脚本监控mysql主从同步,实现如下功能:
阶段1:开发一个守护进程脚本每30秒实现检测一次。
阶段2:如果同步出现如下错误号(1158,1159,1008,1007,1062),请跳过错误
阶段3:请使用数组技术实现上述脚本(获取主从判断及错误号部分)
#!/bin/bash mysql_cmd="mysql -u root -proot" errornum=(1158 1159 1008 1007 1062) while true do array=($($mysql_cmd -e "show slave status\G"|egrep ‘_Running|Behind_Master|Last_SQL_Errno‘|awk ‘{print $NF}‘)) if [ "${array[0]}" == "Yes" -a "${array[1]}" == "Yes" -a "${array[2]}" == "0" ] then echo "MySQL is slave is running" else for ((i=0;i<${#errornum[*]};i++)) do if [ "${array[3]}" = "${errornum[$i]}" ];then $mysql_cmd -e "stop slave &&set global sql_slave_skip_counter=1;start slave;" fi done char="MySQL slave is downed" echo "$char" echo "$char"|mail -s "$char" xxxxx@qq.com break fi sleep 30 done
#!/bin/bash mysql_cmd="mysql -u root -proot" errornum=(1158 1159 1008 1007 1062) while true do array=($($mysql_cmd -e "show slave status\G"|egrep ‘_Running|Behind_Master|Last_SQL_Errno‘|awk ‘{print $NF}‘)) if [ "${array[0]}" == "Yes" -a "${array[1]}" == "Yes" -a "${array[2]}" == "0" ] then echo "MySQL is slave is running" else for ((i=0;i<${#errornum[*]};i++)) do if [ "${array[3]}" = "${errornum[$i]}" ];then $mysql_cmd -e "stop slave &&set global sql_slave_skip_counter=1;start slave;" fi done char="MySQL slave is downed" echo "$char" echo "$char"|mail -s "$char" xxxxx@qq.com break fi sleep 30 done
#!/bin/bash USERNAME="monitor" PASSWORD='123456' SLAVE_HOST="127.0.0.1" SLAVE_PORT="3307" MYSQL="/usr/local/mysql/bin/mysql -u$USERNAME -p$PASSWORD" SLAVE="$MYSQL -h $SLAVE_HOST -P $SLAVE_PORT" datetime=`date +"%Y-%m-%d %H:%M:%S"` #查看复制状态 MySQL_Status=`$SLAVE -e "SHOW SLAVE STATUS\G" | grep -E "Running|Seconds_Behind_Master"|head -n3 ` #获取状态 Slave_IO_Running=`echo $MySQL_Status | grep Slave_IO_Running |awk '{print $2}'` Slave_SQL_Running=`echo $MySQL_Status |grep Slave_SQL_Running |awk '{print $4}'` Seconds_Behind_Master=`echo $MySQL_Status |grep Seconds_Behind_Master |awk '{print $6}'` #IO和SQL线程监控 if [ "$Slave_IO_Running" != "Yes" -o "$Slave_SQL_Running" != "Yes" ] then BODY="$datetime Slave_IO_Running:$Slave_IO_Running Slave_SQL_Running:$Slave_SQL_Running Seconds_Behind_Master:$Seconds_Behind_Master 外网MySQL slave replication error!" echo $BODY else #延时监控,不精准 if [ "$Seconds_Behind_Master" -gt "180" ] then BODY="$datetime Slave_IO_Running:$Slave_IO_Running Slave_SQL_Running:$Slave_SQL_Running Seconds_Behind_Master:$Seconds_Behind_Master 外网MySQL slave replication error!" echo $BODY fi fi
示例一: cat check_mysql_health #!/bin/sh slave_is=($(mysql -S /tmp/mysql3307.sock -uroot -e "show slave status\G"|grep "Slave_.*_Running" |awk '{print $2}')) if [ "${slave_is[0]}" = "Yes" -a "${slave_is[1]}" = "Yes" ] then echo "OK SDK-slave3307 connect master3306 is running" exit 0 else echo "Critical SDK-slave3307 connect master3306 is not running" exit 2 fi 示例二: [root@slavedb test]# cat Check_Mysql_Synchronization.sh #!/bin/bash MYUSER=root MYSOCKET=/tmp/mysql3306.sock MYPASSWD= MYLOGIN="mysql -S $MYSOCKET -u$MYUSER " ERROR=(1158 1159 1008 1007 1062 1050) check_status(){ STATUS=($($MYLOGIN -e "show slave status\G"|egrep "Slave_SQL_Running|Slave_IO_Running|Seconds_Behind_Master|Last_SQL_Errno"|awk '{print $NF}')) if [ "${STATUS[0]}" = "Yes" -a "${STATUS[1]}" = "Yes" -a "${STATUS[2]} = 0" ];then echo "Mysql Slave is Ok!" CHECK_NUM=0 return $CHECK_NUM else CHECK_NUM=1 return $CHECK_NUM fi } check_error(){ check_status if [ $? -eq 1 ];then for ((i=0;i<${#ERROR[*]};i++));do if [ "${ERROR[i]}" == "${STATUS[3]}" ];then $MYLOGIN -e "stop slave;" $MYLOGIN -e "set global sql_slave_skip_counter=1" $MYLOGIN -e "start slave;" fi done fi } check_again(){ STATUS=($($MYLOGIN -e "show slave status\G"|egrep "Slave_SQL_Running|Slave_IO_Running|Seconds_Behind_Master|Last_SQL_Errno"|awk '{print $NF}')) check_status >/dev/null 2&>1 if [ $? -eq 1 ];then echo "Mysql slave is fail $(date +%F)" fi } main(){ while true;do check_error check_again sleep 10 done } main
发表评论