MHA高可用工作原理 1.监控节点,通过配置文件获取所有节点信息 2.主库宕机 1)如果判断从库GTID或者position号,数据有差距,最接近于master的从库会选为备选主 2)如果两个从库数据一致,按照配置文件的顺序进行选择备选主 3)如果设定有权重,按照权重强制指定备选主(默认如果一个slave落后master100M的话,即使有权重也会失效,如果配合check_repl_delay=0的话,即使落后100M,也强制选定为备主) 3.数据补偿 1)当ssh能连接时,从库对比主库GTID或者position号,立即将二进制日志保存至各个节点,并且应用 2)当ssh不能连接时,对比从库之间的relaylog的差异 4.failover 将被选主身份切换,并且对外提供服务,其余从库和新的主库建立主从 5.VIP漂移 配置vip,自动切换到新主库上,直接配置在配置文件中,需要一个脚本 6.binlog server 二次数据补偿,一台额外的机器,实时同步主库的binlog,直接配置在配置文件中
配置
node节点:db01(10.0.0.51),db02(10.0.0.52),db03(10.0.0.53)
manager节点:m01(10.0.0.61)
搭建好一主两从,我选db01为主,并且都以GTID复制
关键程序进行软连接,我是通过tar包安装的MySQL5.7.20
ln -s /app/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ln -s /app/mysql/bin/mysql /usr/bin/mysql
配置互信
[root@m01 ~]# ssh-keygen
[root@m01 ~]# cd /root/.ssh/
[root@m01 .ssh]# mv id_rsa.pub authorized_keys
[root@m01 .ssh]# scp -r /root/.ssh 10.0.0.51:/root
[root@m01 .ssh]# scp -r /root/.ssh 10.0.0.52:/root
[root@m01 .ssh]# scp -r /root/.ssh 10.0.0.53:/root
所有节点安装node依赖包
[root@m01 .ssh]# yum install perl-DBD-MySQL -y
[root@m01 .ssh]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
在主库中创建mha专用用户
db01 [(none)]> grant all privileges on *.* to mha@'10.0.0.%' identified by 'mha';
db01 [(none)]>select user,host from mysql.user;
+---------------+-----------+
| user | host |
+---------------+-----------+
| root | % |
| aaaa | 10.0.0.% |
| mha | 10.0.0.% |
| mysql.session | localhost |
| mysql.sys | localhost |
| root | localhost |
+---------------+-----------+
6 rows in set (0.00 sec)
node节点安装结束
下载manager依赖并安装
[root@m01 ~]# yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
[root@m01 ~]# rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
准备配置文件目录
[root@m01 ~]# mkdir -p /etc/mha
创建日志目录
[root@m01 ~]# mkdir -p /var/log/mha/app1
编辑MHA配置文件
[root@m01 ~]# vim /etc/mha/app1.cnf
[binlog1]
hostname=10.0.0.61 binlog server要放在哪儿
master_binlog_dir=/data/binlog2 日志存放路径
no_master=1
[server default]
manager_log=/var/log/mha/app1/manager
manager_workdir=/var/log/mha/app1
master_binlog_dir=/data/mysql/binlog
master_ip_failover_script=/usr/local/bin/master_ip_failover vip脚本的存放路径
password=mha
ping_interval=2
repl_password=123 主从复制专用用户密码
repl_user=aaaa 主从复制专用用户名
ssh_user=root
user=mha
[server1]
hostname=10.0.0.51
port=3306
[server2]
hostname=10.0.0.52
port=3306
[server3]
hostname=10.0.0.53
port=3306
创建binlog server路径
mkdir /data/binlog2
chown -R mysql.mysql /data/*
拉取日志
[root@m01 ~]# cd /data/binlog2/
[root@m01 ~]# mysqlbinlog -R --host=10.0.0.51 --user=mha --password=mha --raw --stop-never mysql-bin.000001 &
[root@m01 binlog2]# ll
total 696
-rw-r----- 1 mysql mysql 690968 Jul 4 09:01 mysql-bin.000001
-rw-r----- 1 mysql mysql 194 Jul 4 09:01 mysql-bin.000002
-rw-r----- 1 mysql mysql 954 Jul 4 09:01 mysql-bin.000003
-rw-r----- 1 mysql mysql 371 Jul 4 09:01 mysql-bin.000004
-rw-r----- 1 mysql mysql 257 Jul 4 09:01 mysql-bin.000005
-rw-r----- 1 mysql mysql 234 Jul 4 09:01 mysql-bin.000006
上传vip脚本到/usr/local/bin/
[root@m01 binlog2]# cd /usr/local/bin/
[root@m01 bin]# ll
-rwxr-xr-x 1 root root 2166 Jul 3 17:33 master_ip_failover
[root@m01 bin]# chmod +x master_ip_failover
第一次使用vip要手动在主机创建
[root@db01 ~]# ifconfig eth0:1 10.0.0.55/24在主库
manager配置完成,这里直接把MHA的binlog server 和VIP功能加进去了
启动前检测所有节点
[root@m01 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Thu Jul 4 09:56:22 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jul 4 09:56:22 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Thu Jul 4 09:56:22 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
Thu Jul 4 09:56:22 2019 - [info] Starting SSH connection tests..
Thu Jul 4 09:56:23 2019 - [debug]
Thu Jul 4 09:56:22 2019 - [debug] Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.52(10.0.0.52:22)..
Thu Jul 4 09:56:23 2019 - [debug] ok.
Thu Jul 4 09:56:23 2019 - [debug] Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.53(10.0.0.53:22)..
Thu Jul 4 09:56:23 2019 - [debug] ok.
Thu Jul 4 09:56:24 2019 - [debug]
Thu Jul 4 09:56:23 2019 - [debug] Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.51(10.0.0.51:22)..
Thu Jul 4 09:56:23 2019 - [debug] ok.
Thu Jul 4 09:56:23 2019 - [debug] Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.53(10.0.0.53:22)..
Thu Jul 4 09:56:24 2019 - [debug] ok.
Thu Jul 4 09:56:25 2019 - [debug]
Thu Jul 4 09:56:23 2019 - [debug] Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.51(10.0.0.51:22)..
Thu Jul 4 09:56:24 2019 - [debug] ok.
Thu Jul 4 09:56:24 2019 - [debug] Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.52(10.0.0.52:22)..
Thu Jul 4 09:56:24 2019 - [debug] ok.
Thu Jul 4 09:56:25 2019 - [info] All SSH connection tests passed successfully.
检测主从状态
[root@m01 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
MySQL Replication Health is OK.
一切正常,启动MHA
[root@m01 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
检查运行状态
[root@m01 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:1748) is running(0:PING_OK), master:10.0.0.51
模拟故障
把主库kill掉
[root@db01 ~]# pkill mysqld
在db03上查看状态,自动把db02当做主库并自动构建主从
db03 [(none)]>show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.52
Master_User: aaaa
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000007
Read_Master_Log_Pos: 234
Relay_Log_File: db03-relay-bin.000002
Relay_Log_Pos: 367
Relay_Master_Log_File: mysql-bin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
查看db02的vip
[root@db02 ~]# ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:90:b8:9d brd ff:ff:ff:ff:ff:ff
inet 10.0.0.52/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 10.0.0.55/24 brd 10.0.0.255 scope global secondary eth0:1
恢复
启动db01mysql,构建主从,db02为主
[root@db01 ~]# systemctl start mysqld.service
我们可以在mha日志内找到db03的切换主库的命令,可以直接复制过来
[root@m01 ~]# cat /var/log/mha/app1/manager |grep -n 'CHANGE MASTER'
189:Wed Jul 3 21:13:47 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.51', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='aaaa', MASTER_PASSWORD='xxx';
215:Wed Jul 3 21:13:47 2019 - [info] Executed CHANGE MASTER.
486:Thu Jul 4 09:59:39 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.52', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='aaaa', MASTER_PASSWORD='xxx';
512:Thu Jul 4 09:59:39 2019 - [info] Executed CHANGE MASTER.
486行
db01 [(none)]>CHANGE MASTER TO MASTER_HOST='10.0.0.52', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='aaaa', MASTER_PASSWORD='123';
Query OK, 0 rows affected, 2 warnings (0.01 sec)
db01 [(none)]>start slave;
Query OK, 0 rows affected (0.00 sec)
db01 [(none)]>show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.52
Master_User: aaaa
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000007
Read_Master_Log_Pos: 234
Relay_Log_File: db01-relay-bin.000002
Relay_Log_Pos: 367
Relay_Master_Log_File: mysql-bin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
MHA挂掉,binlog server也要重新布置,把原来的日志删掉,重新拉取新主库的日志
[root@m01 binlog2]# rm -rf /data/binlog2/*
[root@m01 binlog2]# mysqlbinlog -R --host=10.0.0.52 --user=mha --password=mha --raw --stop-never mysql-bin.000001 &
[root@m01 binlog2]# ll
total 700
-rw-r----- 1 root root 690913 Jul 4 10:12 mysql-bin.000001
-rw-r----- 1 root root 1108 Jul 4 10:12 mysql-bin.000002
-rw-r----- 1 root root 281 Jul 4 10:12 mysql-bin.000003
-rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000004
-rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000005
-rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000006
-rw-r----- 1 root root 234 Jul 4 10:12 mysql-bin.000007
进行切换后,配置文件内自动清除发生故障的sever,所以启动之前添加配置文件server
准备完毕,启动MHA
[root@m01 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
[2] 4060
[root@m01 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:4060) is running(0:PING_OK), master:10.0.0.52
检查binlog server
db02 [(none)]>db02 [(none)]>show master status;
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+
| mysql-bin.000008 | 234 | | | 8d397b30-8791-11e9-acbd-000c2902fc28:1-17,
c74ed2c4-998e-11e9-b9ee-000c2990b89d:1 |
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
db02 [(none)]>flush logs;
Query OK, 0 rows affected (0.00 sec)
db02 [(none)]>show master status;
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+
| mysql-bin.000009 | 234 | | | 8d397b30-8791-11e9-acbd-000c2902fc28:1-17,
c74ed2c4-998e-11e9-b9ee-000c2990b89d:1 |
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
已复制成功
[root@m01 ~]# ll /data/binlog2/
total 708
-rw-r----- 1 root root 690913 Jul 4 10:12 mysql-bin.000001
-rw-r----- 1 root root 1108 Jul 4 10:12 mysql-bin.000002
-rw-r----- 1 root root 281 Jul 4 10:12 mysql-bin.000003
-rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000004
-rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000005
-rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000006
-rw-r----- 1 root root 281 Jul 4 10:16 mysql-bin.000007
-rw-r----- 1 root root 281 Jul 4 10:16 mysql-bin.000008
-rw-r----- 1 root root 234 Jul 4 10:16 mysql-bin.000009
至此,MHA的高可用功能,binlog server功能和vip漂移功能,部署并测试成功
把主库kill掉 [root@db01 ~]# pkill mysqld 在db03上查看状态,自动把db02当做主库并自动构建主从 db03 [(none)]>show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.0.52 Master_User: aaaa Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000007 Read_Master_Log_Pos: 234 Relay_Log_File: db03-relay-bin.000002 Relay_Log_Pos: 367 Relay_Master_Log_File: mysql-bin.000007 Slave_IO_Running: Yes Slave_SQL_Running: Yes 查看db02的vip [root@db02 ~]# ip a 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 00:0c:29:90:b8:9d brd ff:ff:ff:ff:ff:ff inet 10.0.0.52/24 brd 10.0.0.255 scope global noprefixroute eth0 valid_lft forever preferred_lft forever inet 10.0.0.55/24 brd 10.0.0.255 scope global secondary eth0:1
恢复
启动db01mysql,构建主从,db02为主
[root@db01 ~]# systemctl start mysqld.service
我们可以在mha日志内找到db03的切换主库的命令,可以直接复制过来
[root@m01 ~]# cat /var/log/mha/app1/manager |grep -n 'CHANGE MASTER'
189:Wed Jul 3 21:13:47 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.51', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='aaaa', MASTER_PASSWORD='xxx';
215:Wed Jul 3 21:13:47 2019 - [info] Executed CHANGE MASTER.
486:Thu Jul 4 09:59:39 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.52', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='aaaa', MASTER_PASSWORD='xxx';
512:Thu Jul 4 09:59:39 2019 - [info] Executed CHANGE MASTER.
486行
db01 [(none)]>CHANGE MASTER TO MASTER_HOST='10.0.0.52', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='aaaa', MASTER_PASSWORD='123';
Query OK, 0 rows affected, 2 warnings (0.01 sec)
db01 [(none)]>start slave;
Query OK, 0 rows affected (0.00 sec)
db01 [(none)]>show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.52
Master_User: aaaa
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000007
Read_Master_Log_Pos: 234
Relay_Log_File: db01-relay-bin.000002
Relay_Log_Pos: 367
Relay_Master_Log_File: mysql-bin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
MHA挂掉,binlog server也要重新布置,把原来的日志删掉,重新拉取新主库的日志
[root@m01 binlog2]# rm -rf /data/binlog2/*
[root@m01 binlog2]# mysqlbinlog -R --host=10.0.0.52 --user=mha --password=mha --raw --stop-never mysql-bin.000001 &
[root@m01 binlog2]# ll
total 700
-rw-r----- 1 root root 690913 Jul 4 10:12 mysql-bin.000001
-rw-r----- 1 root root 1108 Jul 4 10:12 mysql-bin.000002
-rw-r----- 1 root root 281 Jul 4 10:12 mysql-bin.000003
-rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000004
-rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000005
-rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000006
-rw-r----- 1 root root 234 Jul 4 10:12 mysql-bin.000007
进行切换后,配置文件内自动清除发生故障的sever,所以启动之前添加配置文件server
准备完毕,启动MHA
[root@m01 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
[2] 4060
[root@m01 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:4060) is running(0:PING_OK), master:10.0.0.52
检查binlog server
db02 [(none)]>db02 [(none)]>show master status;
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+
| mysql-bin.000008 | 234 | | | 8d397b30-8791-11e9-acbd-000c2902fc28:1-17,
c74ed2c4-998e-11e9-b9ee-000c2990b89d:1 |
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
db02 [(none)]>flush logs;
Query OK, 0 rows affected (0.00 sec)
db02 [(none)]>show master status;
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+
| mysql-bin.000009 | 234 | | | 8d397b30-8791-11e9-acbd-000c2902fc28:1-17,
c74ed2c4-998e-11e9-b9ee-000c2990b89d:1 |
+------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
已复制成功
[root@m01 ~]# ll /data/binlog2/
total 708
-rw-r----- 1 root root 690913 Jul 4 10:12 mysql-bin.000001
-rw-r----- 1 root root 1108 Jul 4 10:12 mysql-bin.000002
-rw-r----- 1 root root 281 Jul 4 10:12 mysql-bin.000003
-rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000004
-rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000005
-rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000006
-rw-r----- 1 root root 281 Jul 4 10:16 mysql-bin.000007
-rw-r----- 1 root root 281 Jul 4 10:16 mysql-bin.000008
-rw-r----- 1 root root 234 Jul 4 10:16 mysql-bin.000009
至此,MHA的高可用功能,binlog server功能和vip漂移功能,部署并测试成功