当前位置: 首页 > news >正文

HG_REPMGR autofailvoer自动故障转移

文章目录

  • 文档用途
  • 详细信息

文档用途

HG_REPMGR自动故障转移配置参考

详细信息

配置集群自动故障转移(failover),需要为集群中的每个节点开启 repmgrd 守护进程。当主节点出现故障后,会自动将合适的备节点提升为新主节点,继

续对外提供服务。示例如下。

  1. 配置 postgresql.replication.conf 文件(所有节点)

在上述 postgresql.replication.conf 的基础上,添加如下参数:

shared_preload_libraries='repmgr'

或者

altersystemsetshared_preload_libraries=pg_pathman,timescaledb,repmgr;

重启数据库:

pg_ctl restart
  1. 配置 hg_repmgr.conf(所有节点)

在现有的 hg_repmgr.conf 文件中添加如下参数:

failover=automatic promote_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote'follow_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby follow --upstream-node-id=%n'

如果需要将 repmgr 的日志定位到固定的日志文件可添加 log_file 参数,如 下:

log_file='/opt/highgo/5.6.1/conf/data/log/hg_repmgr.log'

为了防止上述日志文件不断膨胀,可配置系统的 logrotate。(详细步骤略)

  1. 开启 repmgrd 进程(所有节点)
repmgrd-f/opt/highgo/5.6.1/conf/hg_repmgr.conf-d-p/tmp/hg_repmgrd.pid[highgo@dbrsconf]$ repmgrd-d-p/tmp/hg_repmgrd.pid[2019-05-0614:02:42][NOTICE]repmgrd(repmgrd4.2)startingup[2019-05-0614:02:42][INFO]connectingtodatabase""[2019-05-0614:02:43][ERROR]repmgr extensionnotfoundonthis node[2019-05-0614:02:43][DETAIL]repmgr extensionisavailable butnotinstalledindatabase"highgo"[2019-05-0614:02:43][HINT]checkthat this nodeispartofa repmgr cluster[highgo@dbrsconf]$ highgo=# \cYou are now connectedtodatabase"highgo"asuser"highgo".createextension repmgr;[highgo@dbrsconf]$ repmgrd-f/opt/highgo/5.6.1/conf/hg_repmgr.conf-d-p/tmp/hg_repmgrd.pid[2019-05-0614:21:21][NOTICE]repmgrd(repmgrd4.2)startingup[2019-05-0614:21:21][INFO]connectingtodatabase"host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2"[highgo@dbrsconf]$ хϢ: set_repmgrd_pid(): provided pidfileis/tmp/hg_repmgrd.pid[2019-05-0614:21:21][NOTICE]startingmonitoringofnode"dbrs"(ID:1)[2019-05-0614:21:21][NOTICE]monitoring clusterprimary"dbrs"(node ID:1)[highgo@dbrs2conf]$ repmgrd-f/opt/highgo/5.6.1/conf/hg_repmgr.conf-d-p/tmp/hg_repmgrd.pid[2019-05-0614:21:50][NOTICE]repmgrd(repmgrd4.2)startingup[2019-05-0614:21:50][INFO]connectingtodatabase"host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2"[highgo@dbrs2conf]$ хϢ: set_repmgrd_pid(): provided pidfileis/tmp/hg_repmgrd.pid[2019-05-0614:21:50][NOTICE]startingmonitoringofnode"dbrs2"(ID:2)[2019-05-0614:21:50][INFO]monitoring connectiontoupstream node"dbrs"(node ID:1)[highgo@dbrsconf]$ ls-atl/tmp/hg_repmgrd.pid-rw-rw-r--. 1 highgo highgo 5 May 6 14:21 /tmp/hg_repmgrd.pid[highgo@dbrsconf]$[highgo@dbrs2conf]$ ls-atl/tmp/hg_repmgrd.pid-rw-rw-r--. 1 highgo highgo 5 May 6 14:21 /tmp/hg_repmgrd.pid[highgo@dbrs2conf]$

提示:这个后台进程,每次重启服务器,都要手动启动吗?

开发回复:目前是,后期会修改为自动

查看集群状态

[highgo@dbrsconf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf clustershowID|Name|Role|Status|Upstream|Location|Connection string----+-------+---------+-----------+----------+----------+------------------------------------------------------------1|dbrs|primary|*running||default|host=dbrsuser=hgrepmgr dbname=hgrepmgr connect_timeout=22|dbrs2|standby|running|dbrs|default|host=dbrs2user=hgrepmgr dbname=hgrepmgr connect_timeout=2[highgo@dbrsconf]$

模拟主节点故障

1)在 node1 上关闭数据库

pg_ctl stop

2)在 node2 上查看集群状态

[highgo@dbrs2conf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf clustershowID|Name|Role|Status|Upstream|Location|Connection string----+-------+---------+-----------+----------+----------+------------------------------------------------------------1|dbrs|primary|-failed||default|host=dbrsuser=hgrepmgr dbname=hgrepmgr connect_timeout=22|dbrs2|primary|*running||default|host=dbrs2user=hgrepmgr dbname=hgrepmgr connect_timeout=2WARNING:followingissues were detected-unabletoconnecttonode"dbrs"(ID:1)[highgo@dbrs2conf]$

此时 node2 已经提升为 primary

日志

[highgo@dbrs2conf]$[2019-05-0614:24:14][WARNING]unabletoconnecttoupstream node"dbrs"(node ID:1)[2019-05-0614:24:14][INFO]checking stateofnode1,1of6attempts[2019-05-0614:24:14][INFO]sleeping10seconds untilnextreconnection attempt[2019-05-0614:24:24][INFO]checking stateofnode1,2of6attempts[2019-05-0614:24:24][INFO]sleeping10seconds untilnextreconnection attempt[2019-05-0614:24:34][INFO]checking stateofnode1,3of6attempts[2019-05-0614:24:34][INFO]sleeping10seconds untilnextreconnection attempt[2019-05-0614:24:44][INFO]checking stateofnode1,4of6attempts[2019-05-0614:24:44][INFO]sleeping10seconds untilnextreconnection attempt[2019-05-0614:24:54][INFO]checking stateofnode1,5of6attempts[2019-05-0614:24:54][INFO]sleeping10seconds untilnextreconnection attempt[highgo@dbrs2conf]$[2019-05-0614:25:04][INFO]checking stateofnode1,6of6attempts[2019-05-0614:25:04][WARNING]unabletoreconnecttonode1after6attempts[2019-05-0614:25:04][NOTICE]this nodeisthe only available candidateandwill now promote itself[2019-05-0614:25:04][INFO]promote_commandis:"repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote"NOTICE: promoting standbytoprimaryDETAIL: promoting server"dbrs2"(ID:2)using"/opt/highgo/5.6.1/bin/pg_ctl -w -D '/opt/highgo/5.6.1/data' promote"DETAIL: waiting upto60seconds(parameter"promote_check_timeout")forpromotiontocomplete NOTICE: STANDBY PROMOTE successful DETAIL: server"dbrs2"(ID:2)was successfully promotedtoprimary[2019-05-0614:25:10][INFO]switchingtoprimarymonitoringmode[2019-05-0614:25:10][NOTICE]monitoring clusterprimary"dbrs2"(node ID:2)
  1. 当 node1 的故障恢复之后,可重新加入集群
[highgo@dbrsconf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf clustershowID|Name|Role|Status|Upstream|Location|Connection string----+-------+---------+----------------------+----------+----------+------------------------------------------------------------1|dbrs|primary|*running||default|host=dbrsuser=hgrepmgr dbname=hgrepmgr connect_timeout=22|dbrs2|standby|!runningasprimary|dbrs|default|host=dbrs2user=hgrepmgr dbname=hgrepmgr connect_timeout=2

1)重新加入集群 (在故障节点上执行,host指定新的主节点,重新加入后作为standby节点。想想pg_rewind)

repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf node rejoin-d'host=dbrs2 dbname=hgrepmgr user=hgrepmgr'--force-rewind --verbose

注意:执行该命令前应关闭 node1 的 HGDB。

[highgo@dbrsconf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf node rejoin-d'host=dbrs2 dbname=hgrepmgr user=hgrepmgr'--force-rewind --verboseNOTICE:usingprovided configurationfile"/opt/highgo/5.6.1/conf/hg_repmgr.conf"INFO: prerequisitesforusingpg_rewind are met INFO:0files copiedto"/tmp/repmgr-config-archive-dbrs"NOTICE: executing pg_rewind NOTICE:0files copiedto/opt/highgo/5.6.1/dataINFO: directory"/tmp/repmgr-config-archive-dbrs"deleted INFO: deleting"recovery.done"NOTICE: setting node1's primary to node 2 NOTICE: starting server using "/opt/highgo/5.6.1/bin/pg_ctl -w -D '/opt/highgo/5.6.1/data'start" INFO: demotedprimaryispingable INFO: node1has attachedtoits upstream node NOTICE: NODE REJOIN successful DETAIL: node1isnow attachedtonode2[highgo@dbrsconf]$

2)查看集群状态 repmgr cluster show

[highgo@dbrsconf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf clustershowID|Name|Role|Status|Upstream|Location|Connection string----+-------+---------+-----------+----------+----------+------------------------------------------------------------1|dbrs|standby|running|dbrs2|default|host=dbrsuser=hgrepmgr dbname=hgrepmgr connect_timeout=22|dbrs2|primary|*running||default|host=dbrs2user=hgrepmgr dbname=hgrepmgr connect_timeout=2[highgo@dbrsconf]$
http://www.jsqmd.com/news/244259/

相关文章:

  • 2026年网络安全学习路线,零基础入门到精通,看这篇就够了!赶紧收藏!
  • 【必看收藏】RAG分块策略全解析:从入门到精通,解决大模型企业应用痛点
  • 【2026年最新整理】网络安全学习路线,入门到入坟,史上最全网络安全学习路线整理
  • MySQL JOIN语法深度解析:从理论到实践的完整指南
  • CSV Format Flink / PyFlink 读写 CSV 的正确姿势(含 Schema 高级配置)
  • 直流母线电压采集与缓冲调理电路
  • 2026 年计算机圈赚钱技能:必学技术盘点,高薪赛道认准这些!
  • Aliro统一生态、UWB精准无感,2026智能门锁格局将迎巨变
  • 国外论文参考文献怎么找:实用方法与资源推荐
  • 最近在搞永磁同步电机离线参数辨识的项目,发现不少新手在玩SVPWM时总会遇到死区补偿和高频注入这两个大坑。今天就拿Simulink模型说事,咱们边看代码边唠嗑
  • 深度学习毕设选题推荐:基于python_CNN机器学习卷积神经网络训练识别橘子是否新鲜基于python_CNN深度学习卷积神经网络训练识别橘子是否新鲜
  • 基于51单片机的车辆出入光电传感管理系统设计
  • Spark Streaming:Spark的实时流计算API
  • 20 个超实用 CTF 练习平台,让你从菜鸟进阶大神!零基础入门到精通,看这篇就够!
  • 亲测好用专科生必看TOP9AI论文平台测评
  • 基于STM32F407设计的汽车仪表系统
  • COMSOL玩转锂枝晶:四种生长模式实操指南
  • 收藏这份AI客服构建指南:有赞从0到1的实践经验与思考
  • 网络安全小白自学指南:不用拜师学艺,求人不如靠自己
  • 建议收藏:大模型时代程序员新机遇:6大高薪岗位技能要求全解析
  • 全网最全8个AI论文写作软件,助你轻松搞定本科毕业论文!
  • 使用安全版数据库开启ssl加密后jdbc写法
  • 【零基础必学】LangChain+PDF RAG系统实战教程:手把手教你从零搭建可收藏的智能知识库
  • Claude Skills深度解析:大模型智能体架构与Gemini 3对比分析
  • 【收藏】2026年AI大模型最全学习资源包,助力Java开发者转型AI高薪岗
  • 大模型产品经理成长全攻略:零基础到精通的完整路径_大模型产品经理学习路线
  • KNN算法详解
  • 手把手AI论文工具全攻略:9款神器精准控率无压力操作指南
  • 卡尔曼滤波做轨迹跟踪 鲁棒卡尔曼滤波做野值剔除后的预测 扩展卡尔曼滤波对GPS数据进行状态估计滤波
  • 2026年PLC厂家推荐:2026年度权威评测与市场格局排名解析