MHA高可用工作原理 1.监控节点,通过配置文件获取所有节点信息 2.主库宕机 1)如果判断从库GTID或者position号,数据有差距,最接近于master的从库会选为备选主 2)如果两个从库数据一致,按照配置文件的顺序进行选择备选主 3)如果设定有权重,按照权重强制指定备选主(默认如果一个slave落后master100M的话,即使有权重也会失效,如果配合check_repl_delay=0的话,即使落后100M,也强制选定为备主) 3.数据补偿 1)当ssh能连接时,从库对比主库GTID或者position号,立即将二进制日志保存至各个节点,并且应用 2)当ssh不能连接时,对比从库之间的relaylog的差异 4.failover 将被选主身份切换,并且对外提供服务,其余从库和新的主库建立主从 5.VIP漂移 配置vip,自动切换到新主库上,直接配置在配置文件中,需要一个脚本 6.binlog server 二次数据补偿,一台额外的机器,实时同步主库的binlog,直接配置在配置文件中
配置
node节点:db01(10.0.0.51),db02(10.0.0.52),db03(10.0.0.53) manager节点:m01(10.0.0.61) 搭建好一主两从,我选db01为主,并且都以GTID复制
关键程序进行软连接,我是通过tar包安装的MySQL5.7.20 ln -s /app/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog ln -s /app/mysql/bin/mysql /usr/bin/mysql 配置互信 [root@m01 ~]# ssh-keygen [root@m01 ~]# cd /root/.ssh/ [root@m01 .ssh]# mv id_rsa.pub authorized_keys [root@m01 .ssh]# scp -r /root/.ssh 10.0.0.51:/root [root@m01 .ssh]# scp -r /root/.ssh 10.0.0.52:/root [root@m01 .ssh]# scp -r /root/.ssh 10.0.0.53:/root 所有节点安装node依赖包 [root@m01 .ssh]# yum install perl-DBD-MySQL -y [root@m01 .ssh]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm 在主库中创建mha专用用户 db01 [(none)]> grant all privileges on *.* to mha@'10.0.0.%' identified by 'mha'; db01 [(none)]>select user,host from mysql.user; +---------------+-----------+ | user | host | +---------------+-----------+ | root | % | | aaaa | 10.0.0.% | | mha | 10.0.0.% | | mysql.session | localhost | | mysql.sys | localhost | | root | localhost | +---------------+-----------+ 6 rows in set (0.00 sec) node节点安装结束
下载manager依赖并安装 [root@m01 ~]# yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes [root@m01 ~]# rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm 准备配置文件目录 [root@m01 ~]# mkdir -p /etc/mha 创建日志目录 [root@m01 ~]# mkdir -p /var/log/mha/app1 编辑MHA配置文件 [root@m01 ~]# vim /etc/mha/app1.cnf [binlog1] hostname=10.0.0.61 binlog server要放在哪儿 master_binlog_dir=/data/binlog2 日志存放路径 no_master=1 [server default] manager_log=/var/log/mha/app1/manager manager_workdir=/var/log/mha/app1 master_binlog_dir=/data/mysql/binlog master_ip_failover_script=/usr/local/bin/master_ip_failover vip脚本的存放路径 password=mha ping_interval=2 repl_password=123 主从复制专用用户密码 repl_user=aaaa 主从复制专用用户名 ssh_user=root user=mha [server1] hostname=10.0.0.51 port=3306 [server2] hostname=10.0.0.52 port=3306 [server3] hostname=10.0.0.53 port=3306 创建binlog server路径 mkdir /data/binlog2 chown -R mysql.mysql /data/* 拉取日志 [root@m01 ~]# cd /data/binlog2/ [root@m01 ~]# mysqlbinlog -R --host=10.0.0.51 --user=mha --password=mha --raw --stop-never mysql-bin.000001 & [root@m01 binlog2]# ll total 696 -rw-r----- 1 mysql mysql 690968 Jul 4 09:01 mysql-bin.000001 -rw-r----- 1 mysql mysql 194 Jul 4 09:01 mysql-bin.000002 -rw-r----- 1 mysql mysql 954 Jul 4 09:01 mysql-bin.000003 -rw-r----- 1 mysql mysql 371 Jul 4 09:01 mysql-bin.000004 -rw-r----- 1 mysql mysql 257 Jul 4 09:01 mysql-bin.000005 -rw-r----- 1 mysql mysql 234 Jul 4 09:01 mysql-bin.000006 上传vip脚本到/usr/local/bin/ [root@m01 binlog2]# cd /usr/local/bin/ [root@m01 bin]# ll -rwxr-xr-x 1 root root 2166 Jul 3 17:33 master_ip_failover [root@m01 bin]# chmod +x master_ip_failover 第一次使用vip要手动在主机创建 [root@db01 ~]# ifconfig eth0:1 10.0.0.55/24在主库 manager配置完成,这里直接把MHA的binlog server 和VIP功能加进去了
启动前检测所有节点 [root@m01 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf Thu Jul 4 09:56:22 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Thu Jul 4 09:56:22 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Thu Jul 4 09:56:22 2019 - [info] Reading server configuration from /etc/mha/app1.cnf.. Thu Jul 4 09:56:22 2019 - [info] Starting SSH connection tests.. Thu Jul 4 09:56:23 2019 - [debug] Thu Jul 4 09:56:22 2019 - [debug] Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.52(10.0.0.52:22).. Thu Jul 4 09:56:23 2019 - [debug] ok. Thu Jul 4 09:56:23 2019 - [debug] Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.53(10.0.0.53:22).. Thu Jul 4 09:56:23 2019 - [debug] ok. Thu Jul 4 09:56:24 2019 - [debug] Thu Jul 4 09:56:23 2019 - [debug] Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.51(10.0.0.51:22).. Thu Jul 4 09:56:23 2019 - [debug] ok. Thu Jul 4 09:56:23 2019 - [debug] Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.53(10.0.0.53:22).. Thu Jul 4 09:56:24 2019 - [debug] ok. Thu Jul 4 09:56:25 2019 - [debug] Thu Jul 4 09:56:23 2019 - [debug] Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.51(10.0.0.51:22).. Thu Jul 4 09:56:24 2019 - [debug] ok. Thu Jul 4 09:56:24 2019 - [debug] Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.52(10.0.0.52:22).. Thu Jul 4 09:56:24 2019 - [debug] ok. Thu Jul 4 09:56:25 2019 - [info] All SSH connection tests passed successfully. 检测主从状态 [root@m01 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf MySQL Replication Health is OK. 一切正常,启动MHA [root@m01 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 & 检查运行状态 [root@m01 ~]# masterha_check_status --conf=/etc/mha/app1.cnf app1 (pid:1748) is running(0:PING_OK), master:10.0.0.51
模拟故障
把主库kill掉 [root@db01 ~]# pkill mysqld 在db03上查看状态,自动把db02当做主库并自动构建主从 db03 [(none)]>show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.0.52 Master_User: aaaa Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000007 Read_Master_Log_Pos: 234 Relay_Log_File: db03-relay-bin.000002 Relay_Log_Pos: 367 Relay_Master_Log_File: mysql-bin.000007 Slave_IO_Running: Yes Slave_SQL_Running: Yes 查看db02的vip [root@db02 ~]# ip a 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 00:0c:29:90:b8:9d brd ff:ff:ff:ff:ff:ff inet 10.0.0.52/24 brd 10.0.0.255 scope global noprefixroute eth0 valid_lft forever preferred_lft forever inet 10.0.0.55/24 brd 10.0.0.255 scope global secondary eth0:1
恢复
启动db01mysql,构建主从,db02为主 [root@db01 ~]# systemctl start mysqld.service 我们可以在mha日志内找到db03的切换主库的命令,可以直接复制过来 [root@m01 ~]# cat /var/log/mha/app1/manager |grep -n 'CHANGE MASTER' 189:Wed Jul 3 21:13:47 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.51', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='aaaa', MASTER_PASSWORD='xxx'; 215:Wed Jul 3 21:13:47 2019 - [info] Executed CHANGE MASTER. 486:Thu Jul 4 09:59:39 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.52', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='aaaa', MASTER_PASSWORD='xxx'; 512:Thu Jul 4 09:59:39 2019 - [info] Executed CHANGE MASTER. 486行 db01 [(none)]>CHANGE MASTER TO MASTER_HOST='10.0.0.52', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='aaaa', MASTER_PASSWORD='123'; Query OK, 0 rows affected, 2 warnings (0.01 sec) db01 [(none)]>start slave; Query OK, 0 rows affected (0.00 sec) db01 [(none)]>show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.0.52 Master_User: aaaa Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000007 Read_Master_Log_Pos: 234 Relay_Log_File: db01-relay-bin.000002 Relay_Log_Pos: 367 Relay_Master_Log_File: mysql-bin.000007 Slave_IO_Running: Yes Slave_SQL_Running: Yes MHA挂掉,binlog server也要重新布置,把原来的日志删掉,重新拉取新主库的日志 [root@m01 binlog2]# rm -rf /data/binlog2/* [root@m01 binlog2]# mysqlbinlog -R --host=10.0.0.52 --user=mha --password=mha --raw --stop-never mysql-bin.000001 & [root@m01 binlog2]# ll total 700 -rw-r----- 1 root root 690913 Jul 4 10:12 mysql-bin.000001 -rw-r----- 1 root root 1108 Jul 4 10:12 mysql-bin.000002 -rw-r----- 1 root root 281 Jul 4 10:12 mysql-bin.000003 -rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000004 -rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000005 -rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000006 -rw-r----- 1 root root 234 Jul 4 10:12 mysql-bin.000007 进行切换后,配置文件内自动清除发生故障的sever,所以启动之前添加配置文件server 准备完毕,启动MHA [root@m01 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 & [2] 4060 [root@m01 ~]# masterha_check_status --conf=/etc/mha/app1.cnf app1 (pid:4060) is running(0:PING_OK), master:10.0.0.52 检查binlog server db02 [(none)]>db02 [(none)]>show master status; +------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+ | mysql-bin.000008 | 234 | | | 8d397b30-8791-11e9-acbd-000c2902fc28:1-17, c74ed2c4-998e-11e9-b9ee-000c2990b89d:1 | +------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+ 1 row in set (0.00 sec) db02 [(none)]>flush logs; Query OK, 0 rows affected (0.00 sec) db02 [(none)]>show master status; +------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+ | mysql-bin.000009 | 234 | | | 8d397b30-8791-11e9-acbd-000c2902fc28:1-17, c74ed2c4-998e-11e9-b9ee-000c2990b89d:1 | +------------------+----------+--------------+------------------+-----------------------------------------------------------------------------------+ 1 row in set (0.00 sec) 已复制成功 [root@m01 ~]# ll /data/binlog2/ total 708 -rw-r----- 1 root root 690913 Jul 4 10:12 mysql-bin.000001 -rw-r----- 1 root root 1108 Jul 4 10:12 mysql-bin.000002 -rw-r----- 1 root root 281 Jul 4 10:12 mysql-bin.000003 -rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000004 -rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000005 -rw-r----- 1 root root 257 Jul 4 10:12 mysql-bin.000006 -rw-r----- 1 root root 281 Jul 4 10:16 mysql-bin.000007 -rw-r----- 1 root root 281 Jul 4 10:16 mysql-bin.000008 -rw-r----- 1 root root 234 Jul 4 10:16 mysql-bin.000009 至此,MHA的高可用功能,binlog server功能和vip漂移功能,部署并测试成功