跟inthirties学Linux上的HA(Heartbeat之Heartbeat的应用和测试实例)

网友投稿 269 2022-12-02

跟inthirties学Linux上的HA(Heartbeat之Heartbeat的应用和测试实例)

heartbeat实现的failover和take over的有效性 下面我们来测试我们的heartbeat实现的failover和take over的有效性。 先都停止两个节点的heartbeat的服务。 service heartbeat stop 先停掉节点二上的heartbeat, [root@~]# service heartbeat stop Stopping High-Availability services: [ OK ] 再停掉节点一上面的heartbeat [root@~]# service heartbeat stop [root@~]# service heartbeat stop Stopping High-Availability services: [ OK ] 查询ifconfig [root@~]# ifconfig eth0:0 eth0:0 Link encap:Ethernet HWaddr 00:0C:29:01:1A:6C UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:185 Base address:0x1400 [root@~]# ifconfig eth0:0 eth0:0 Link encap:Ethernet HWaddr 00:0C:29:56:84:0F UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:185 Base address:0x1400 可以看到eth0:0都是无效的 现在监控/var/log/ha-log 在两个节点上运行tail -s 1 -n 50 -f /var/log/ha-log [root@~]# echo "" > /var/log/ha-log [root@~]# tail -s 1 -n 50 -f /var/log/ha-log [root@~]# echo "" > /var/log/ha-log [root@~]# tail -s 1 -n 50 -f /var/log/ha-log 监控日志的变化 在节点一上启动heartbeat [root@~]# service heartbeat start Starting High-Availability services: 2010/01/31_08:42:43 INFO: IPaddr Resource is stopped [ OK ] 节点以上的日志发生变化。 heartbeat[8952]: 2010/01/31_08:42:43 WARN: Traditional compression selected. Realtime behavior will likely be impacted(!) heartbeat[8952]: 2010/01/31_08:42:43 info: See ​​​for more information. heartbeat[8952]: 2010/01/31_08:42:43 WARN: Logging daemon is disabled --enabling logging daemon is recommended heartbeat[8952]: 2010/01/31_08:42:43 info: ************************** heartbeat[8952]: 2010/01/31_08:42:43 info: Configuration validated. Starting heartbeat 2.0.4 heartbeat[8953]: 2010/01/31_08:42:43 info: heartbeat: version 2.0.4 heartbeat[8953]: 2010/01/31_08:42:43 info: Heartbeat generation: 7 heartbeat[8953]: 2010/01/31_08:42:43 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[8953]: 2010/01/31_08:42:43 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[8953]: 2010/01/31_08:42:43 info: Removing /var/run/heartbeat/rsctmp failed, recreating. heartbeat[8953]: 2010/01/31_08:42:43 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1 heartbeat[8953]: 2010/01/31_08:42:43 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1 heartbeat[8953]: 2010/01/31_08:42:43 info: glib: ping heartbeat started. heartbeat[8953]: 2010/01/31_08:42:43 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[8953]: 2010/01/31_08:42:43 info: Local status now set to: 'up' heartbeat[8953]: 2010/01/31_08:42:43 info: Exiting write_hostcachedata process 8961 returned rc 0. heartbeat[8953]: 2010/01/31_08:42:44 info: Link up. heartbeat[8953]: 2010/01/31_08:42:44 info: Link 192.168.137.1:192.168.137.1 up. heartbeat[8953]: 2010/01/31_08:42:44 info: Status update for node 192.168.137.1: status ping 服务启动成功,但是我们并没有看到192.168.137.37和 访问 ​​​ 也无法访问,还记得我们在前面设置的initdead把,主机启动,需要等候120秒,等上2分钟。 我们看看日志的变化 heartbeat[8953]: 2010/01/31_08:44:44 WARN: node is dead heartbeat[8953]: 2010/01/31_08:44:44 info: Comm_now_up(): updating status to active heartbeat[8953]: 2010/01/31_08:44:44 info: Local status now set to: 'active' heartbeat[8953]: 2010/01/31_08:44:44 info: Starting child client "/usr/lib64/heartbeat/ipfail" (502,503) heartbeat[8962]: 2010/01/31_08:44:44 info: Starting "/usr/lib64/heartbeat/ipfail" as uid 502 gid 503 (pid 8962) heartbeat[8953]: 2010/01/31_08:44:44 WARN: No STONITH device configured. heartbeat[8953]: 2010/01/31_08:44:44 WARN: Shared disks are not protected. heartbeat[8953]: 2010/01/31_08:44:44 info: Resources being acquired from harc[8963]: 2010/01/31_08:44:44 info: Running /etc/ha.d/rc.d/status status mach_down[8983]: 2010/01/31_08:44:44 info: /usr/lib64/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[8983]: 2010/01/31_08:44:44 info: mach_down takeover complete for node heartbeat[8953]: 2010/01/31_08:44:44 info: mach_down takeover complete. heartbeat[8953]: 2010/01/31_08:44:44 info: Initial resource acquisition complete (mach_down) IPaddr[9019]: 2010/01/31_08:44:45 INFO: IPaddr Resource is stopped heartbeat[8964]: 2010/01/31_08:44:45 info: Local Resource acquisition completed. harc[9126]: 2010/01/31_08:44:45 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp ip-request-resp[9126]: 2010/01/31_08:44:45 received ip-request-resp 192.168.137.37 OK yes ResourceManager[9141]: 2010/01/31_08:44:45 info: Acquiring resource group: 192.168.137.37 ipvsadm IPaddr[9165]: 2010/01/31_08:44:45 INFO: IPaddr Resource is stopped ResourceManager[9141]: 2010/01/31_08:44:45 info: Running /etc/ha.d/resource.d/IPaddr 192.168.137.37 start IPaddr[9354]: 2010/01/31_08:44:46 INFO: /sbin/ifconfig eth0:0 192.168.137.37 netmask 255.255.255.0 IPaddr[9354]: 2010/01/31_08:44:46 INFO: Sending Gratuitous Arp for 192.168.137.37 on eth0:0 [eth0] IPaddr[9354]: 2010/01/31_08:44:46 INFO: /usr/lib64/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.137.37 eth0 192.168.137.37 auto 192.168.137.37 ffffffffffff IPaddr[9284]: 2010/01/31_08:44:46 INFO: IPaddr Success ResourceManager[9141]: 2010/01/31_08:44:46 info: Running /etc/init.d/ipvsadm start ResourceManager[9141]: 2010/01/31_08:44:47 info: Running /etc/init.d/start heartbeat[8953]: 2010/01/31_08:44:54 info: Local Resource acquisition completed. (none) heartbeat[8953]: 2010/01/31_08:44:54 info: local resource transition completed. 出现我们需要的东西了吧, 看看日志的时间。 最上面的时间点2010/01/31_08:42:44 下面的时间点 2010/01/31_08:44:44 刚好2分钟,电脑就是这般,一点不差的 下面我们看看eth0:0的信息 [root@~]# ifconfig eth0:0 eth0:0 Link encap:Ethernet HWaddr 00:0C:29:56:84:0F inet addr:192.168.137.37 Bcast:192.168.137.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:185 Base address:0x1400 OK 访问 ​​​成功 下面我们看看节点二的状况吧。 节点二还没有启动,所以日志是空的,还没有变化。 现在我们启动节点二 [root@~]# service heartbeat start Starting High-Availability services: 2010/01/31_01:41:28 INFO: IPaddr Resource is stopped [ OK ] 节点启动,由于这时候是健康的,所以节点二不会take over资源,而是等待着崭露头角的机会 看看他的日志 heartbeat[13844]: 2010/01/31_01:41:29 WARN: Traditional compression selected. Realtime behavior will likely be impacted(!) heartbeat[13844]: 2010/01/31_01:41:29 info: See ​​​for more information. heartbeat[13844]: 2010/01/31_01:41:29 WARN: Logging daemon is disabled --enabling logging daemon is recommended heartbeat[13844]: 2010/01/31_01:41:29 info: ************************** heartbeat[13844]: 2010/01/31_01:41:29 info: Configuration validated. Starting heartbeat 2.0.4 heartbeat[13845]: 2010/01/31_01:41:29 info: heartbeat: version 2.0.4 heartbeat[13845]: 2010/01/31_01:41:29 info: Heartbeat generation: 2 heartbeat[13845]: 2010/01/31_01:41:29 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[13845]: 2010/01/31_01:41:29 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[13845]: 2010/01/31_01:41:29 info: Removing /var/run/heartbeat/rsctmp failed, recreating. heartbeat[13845]: 2010/01/31_01:41:29 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1 heartbeat[13845]: 2010/01/31_01:41:29 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1 heartbeat[13845]: 2010/01/31_01:41:29 info: glib: ping heartbeat started. heartbeat[13845]: 2010/01/31_01:41:29 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[13845]: 2010/01/31_01:41:29 info: Local status now set to: 'up' heartbeat[13845]: 2010/01/31_01:41:29 info: Exiting write_hostcachedata process 13853 returned rc 0. heartbeat[13845]: 2010/01/31_01:41:30 info: Link 192.168.137.1:192.168.137.1 up. heartbeat[13845]: 2010/01/31_01:41:30 info: Status update for node 192.168.137.1: status ping heartbeat[13845]: 2010/01/31_01:41:30 info: Link up. heartbeat[13845]: 2010/01/31_01:41:30 info: Link up. heartbeat[13845]: 2010/01/31_01:41:30 info: Status update for node status active heartbeat[13845]: 2010/01/31_01:41:31 info: Exiting write_hostcachedata process 13855 returned rc 0. harc[13854]: 2010/01/31_01:41:31 info: Running /etc/ha.d/rc.d/status status heartbeat[13845]: 2010/01/31_01:41:31 info: Comm_now_up(): updating status to active heartbeat[13845]: 2010/01/31_01:41:31 info: Local status now set to: 'active' heartbeat[13845]: 2010/01/31_01:41:31 info: Starting child client "/usr/lib64/heartbeat/ipfail" (502,503) heartbeat[13865]: 2010/01/31_01:41:31 info: Starting "/usr/lib64/heartbeat/ipfail" as uid 502 gid 503 (pid 13865) heartbeat[13845]: 2010/01/31_01:41:31 info: remote resource transition completed. heartbeat[13845]: 2010/01/31_01:41:31 info: remote resource transition completed. heartbeat[13845]: 2010/01/31_01:41:31 info: Local Resource acquisition completed. (none) heartbeat[13845]: 2010/01/31_01:41:32 info: wants to go standby [foreign] heartbeat[13845]: 2010/01/31_01:41:33 info: standby: acquire [foreign] resources from heartbeat[13866]: 2010/01/31_01:41:33 info: acquire local HA resources (standby). heartbeat[13866]: 2010/01/31_01:41:33 info: local HA resource acquisition completed (standby). heartbeat[13845]: 2010/01/31_01:41:33 info: Standby resource acquisition done [foreign]. heartbeat[13845]: 2010/01/31_01:41:33 info: Initial resource acquisition complete (auto_failback) heartbeat[13845]: 2010/01/31_01:41:34 info: remote resource transition completed. heartbeat[13845]: 2010/01/31_01:41:45 info: wants to go standby [foreign] heartbeat[13845]: 2010/01/31_01:41:46 info: standby: can take our foreign resources heartbeat[13876]: 2010/01/31_01:41:46 info: give up foreign HA resources (standby). ResourceManager[13886]: 2010/01/31_01:41:46 info: Releasing resource group: 192.168.137.37 ipvsadm ResourceManager[13886]: 2010/01/31_01:41:46 info: Running /etc/init.d/stop ResourceManager[13886]: 2010/01/31_01:41:46 ERROR: Return code 1 from /etc/init.d/ ResourceManager[13886]: 2010/01/31_01:41:47 info: Retrying failed stop operation [ ResourceManager[13886]: 2010/01/31_01:41:47 info: Running /etc/init.d/stop ResourceManager[13886]: 2010/01/31_01:41:48 ERROR: Return code 1 from /etc/init.d/ ResourceManager[13886]: 2010/01/31_01:41:49 info: Retrying failed stop operation [ ResourceManager[13886]: 2010/01/31_01:41:49 info: Running /etc/init.d/stop ResourceManager[13886]: 2010/01/31_01:41:49 ERROR: Return code 1 from /etc/init.d/ ResourceManager[13886]: 2010/01/31_01:41:50 info: Retrying failed stop operation [ ResourceManager[13886]: 2010/01/31_01:41:50 info: Running /etc/init.d/stop ResourceManager[13886]: 2010/01/31_01:41:50 ERROR: Return code 1 from /etc/init.d/ ResourceManager[13886]: 2010/01/31_01:41:51 info: Retrying failed stop operation [ ResourceManager[13886]: 2010/01/31_01:41:51 info: Running /etc/init.d/stop ResourceManager[13886]: 2010/01/31_01:41:52 ERROR: Return code 1 from /etc/init.d/ ResourceManager[13886]: 2010/01/31_01:41:53 info: Retrying failed stop operation [ ResourceManager[13886]: 2010/01/31_01:41:53 info: Running /etc/init.d/stop ResourceManager[13886]: 2010/01/31_01:41:53 ERROR: Return code 1 from /etc/init.d/ ResourceManager[13886]: 2010/01/31_01:41:54 info: Retrying failed stop operation [ ResourceManager[13886]: 2010/01/31_01:41:54 info: Running /etc/init.d/stop ResourceManager[13886]: 2010/01/31_01:41:54 ERROR: Return code 1 from /etc/init.d/ ResourceManager[13886]: 2010/01/31_01:41:56 info: Retrying failed stop operation [ ResourceManager[13886]: 2010/01/31_01:41:56 info: Running /etc/init.d/stop ResourceManager[13886]: 2010/01/31_01:41:56 ERROR: Return code 1 from /etc/init.d/ ResourceManager[13886]: 2010/01/31_01:41:57 info: Retrying failed stop operation [ ResourceManager[13886]: 2010/01/31_01:41:57 info: Running /etc/init.d/stop ResourceManager[13886]: 2010/01/31_01:41:57 ERROR: Return code 1 from /etc/init.d/ ResourceManager[13886]: 2010/01/31_01:41:58 info: Retrying failed stop operation [ ResourceManager[13886]: 2010/01/31_01:41:58 info: Running /etc/init.d/stop ResourceManager[13886]: 2010/01/31_01:41:59 ERROR: Return code 1 from /etc/init.d/ ResourceManager[13886]: 2010/01/31_01:42:00 info: Retrying failed stop operation [ ResourceManager[13886]: 2010/01/31_01:42:00 info: Running /etc/init.d/stop ResourceManager[13886]: 2010/01/31_01:42:00 ERROR: Return code 1 from /etc/init.d/ ResourceManager[13886]: 2010/01/31_01:42:00 ERROR: Resource script for probably not LSB-compliant. ResourceManager[13886]: 2010/01/31_01:42:00 WARN: it (MUST succeed on a stop when already stopped ResourceManager[13886]: 2010/01/31_01:42:00 WARN: Machine reboot narrowly avoided! ResourceManager[13886]: 2010/01/31_01:42:00 info: Running /etc/init.d/ipvsadm stop ResourceManager[13886]: 2010/01/31_01:42:00 info: Running /etc/ha.d/resource.d/IPaddr 192.168.137.37 stop IPaddr[14409]: 2010/01/31_01:42:01 INFO: IPaddr Success heartbeat[13876]: 2010/01/31_01:42:01 info: foreign HA resource release completed (standby). heartbeat[13845]: 2010/01/31_01:42:01 info: Local standby process completed [foreign]. heartbeat[13845]: 2010/01/31_01:42:02 WARN: 1 lost packet(s) for [[209:211] heartbeat[13845]: 2010/01/31_01:42:02 info: remote resource transition completed. heartbeat[13845]: 2010/01/31_01:42:02 info: No pkts missing from heartbeat[13845]: 2010/01/31_01:42:02 info: Other node completed standby takeover of foreign resources. 这里的日志可以看的清楚他的动作 由于节点1是healthy的,所以节点二要关闭自己的资源,虽然我们前面的 现在看看节点二的服务 [root@~]# ps -ef | grep 没有 节点一上的 [root@~]# ps -ef | grep 出现一堆的 正常状况下,虚拟IP和资源建立在节点一(primary)上。节点二(standby)上没有虚拟IP没有资源 下面我们来看看主机由异常的状况。 先做断网实验 ifdown掉节点的eth0 注意监控两个节点的日志变化 在节点一上运行ifdown eth0 节点一网断了,日志我们就看不到了 不过我们可以看到节点二上的日志变化 heartbeat[13845]: 2010/01/31_01:50:03 info: wants to go standby [all] heartbeat[13845]: 2010/01/31_01:50:29 info: standby: acquire [all] resources from heartbeat[14550]: 2010/01/31_01:50:29 info: acquire all HA resources (standby). ResourceManager[14560]: 2010/01/31_01:50:29 info: Acquiring resource group: 192.168.137.37 ipvsadm IPaddr[14584]: 2010/01/31_01:50:29 INFO: IPaddr Resource is stopped ResourceManager[14560]: 2010/01/31_01:50:30 info: Running /etc/ha.d/resource.d/IPaddr 192.168.137.37 start IPaddr[14773]: 2010/01/31_01:50:30 INFO: /sbin/ifconfig eth0:0 192.168.137.37 netmask 255.255.255.0 IPaddr[14773]: 2010/01/31_01:50:30 INFO: Sending Gratuitous Arp for 192.168.137.37 on eth0:0 [eth0] IPaddr[14773]: 2010/01/31_01:50:30 INFO: /usr/lib64/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.137.37 eth0 192.168.137.37 auto 192.168.137.37 ffffffffffff IPaddr[14703]: 2010/01/31_01:50:30 INFO: IPaddr Success ResourceManager[14560]: 2010/01/31_01:50:30 info: Running /etc/init.d/ipvsadm start ResourceManager[14560]: 2010/01/31_01:50:31 info: Running /etc/init.d/start heartbeat[14550]: 2010/01/31_01:50:31 info: all HA resource acquisition completed (standby). heartbeat[13845]: 2010/01/31_01:50:31 info: Standby resource acquisition done [all]. heartbeat[13845]: 2010/01/31_01:50:32 info: remote resource transition completed. 很清楚,我们现在的节点二终于有了露脸的机会,已经take over了节点一上的资源,现在虚拟IP和资源都运行在了节点二上 查查看就知道了,先看虚拟IP [root@~]# ifconfig eth0:0 eth0:0 Link encap:Ethernet HWaddr 00:0C:29:01:1A:6C inet addr:192.168.137.37 Bcast:192.168.137.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:185 Base address:0x1400 虚拟IP已经建立好了 [root@~]# ps -ef | grep 出现一堆,已经OK叻。 下面我们恢复eth0 节点二上的信息 heartbeat[13845]: 2010/01/31_01:54:36 info: wants to go standby [foreign] heartbeat[13845]: 2010/01/31_01:54:36 info: standby: can take our foreign resources heartbeat[14941]: 2010/01/31_01:54:36 info: give up foreign HA resources (standby). ResourceManager[14951]: 2010/01/31_01:54:36 info: Releasing resource group: 192.168.137.37 ipvsadm ResourceManager[14951]: 2010/01/31_01:54:36 info: Running /etc/init.d/stop ResourceManager[14951]: 2010/01/31_01:54:48 info: Running /etc/init.d/ipvsadm stop IPaddr 192.168.137.37 stop IPaddr[15112]: 2010/01/31_01:54:48 INFO: /sbin/route -n del -host 192.168.137.37 IPaddr[15112]: 2010/01/31_01:54:48 INFO: /sbin/ifconfig eth0:0 192.168.137.37 down IPaddr[15112]: 2010/01/31_01:54:48 INFO: IP Address 192.168.137.37 released IPaddr[15042]: 2010/01/31_01:54:48 INFO: IPaddr Success heartbeat[14941]: 2010/01/31_01:54:48 info: foreign HA resource release completed (standby). heartbeat[13845]: 2010/01/31_01:54:48 info: Local standby process completed [foreign]. heartbeat[13845]: 2010/01/31_01:54:50 WARN: 1 lost packet(s) for [[602:604] heartbeat[13845]: 2010/01/31_01:54:50 info: remote resource transition completed. heartbeat[13845]: 2010/01/31_01:54:50 info: No pkts missing from heartbeat[13845]: 2010/01/31_01:54:50 info: Other node completed standby takeover of foreign resources. 日志相当的清楚了 节点二又到了standby的模式,配角的生活又开始叻 现在又进入了主角的时代叻。 有兴趣的朋友可以继续做不同的实验来进行验证,这里inthirties就介绍到这里,有疑问的朋友可以联系我一起研究。 这个HA是个简易的方案,仅仅只是做了failover,在HA里还有lb和parallel computing,这里的方式都没有实现了。与Oracle的RAC相比简直是功能太有局限叻。这个是低成本的HA方案。针对于与低成本需要的应用可以试试。 在下一篇文章里,inthirties还会以Oracle服务为例,通过heartbeat的方式实现低成本的HA为例,进一步带领大家学习和了解heartbeat的具体应用。敬请关注。

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:Java反射及性能详细
下一篇:DB_UNIQUE_NAME和DB_NAME的理解
相关文章

 发表评论

暂时没有评论,来抢沙发吧~