linux cpu占用率如何看
227
2022-09-19
Barman备份恢复迁移——Introduction
在完美的世界中,不需要备份。但是,这很重要,尤其是在商业环境,为“意外”发生做好准备。 在数据库场景中,意外可以采用以下任何一种形式:In a perfect world, there would be no need for a backup. However, it is important, especially in business environments, to be prepared for when the “unexpected” happens. In a database scenario, the unexpected could take any of the following forms: • data corruption 数据损坏 • system failure (including hardware failure) 系统故障(包括硬件故障) • human error 人为错误 • natural disaster 自然灾害
在这种情况下,任何 ICT 经理或 DBA 都应该能够修复事件并恢复数据库尽可能短的时间。 我们通常将此学科称为灾难恢复,更广泛地说业务连续性。In such cases, any ICT manager or DBA should be able to fix the incident and recover the database in the shortest time possible. We normally refer to this discipline as disaster recovery, and more broadly business continuity. 在业务连续性中,熟悉以下定义的两个基本指标非常重要维基百科:Within business continuity, it is important to familiarise with two fundamental metrics, as defined by Wikipedia: • Recovery Point Objective (RPO): “maximum targeted period in which data might be lost from an IT service due to a major incident” 恢复点目标 (RPO):“数据可能丢失的最大目标期限因重大事件而导致的” • Recovery Time Objective (RTO): “the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity” 恢复时间目标 (RTO):“为了避免与业务连续性中断相关的不可接受的后果,必须在灾难(或中断)后恢复业务流程的目标持续时间和服务级别”
简而言之,RPO 代表您可以承受的最大数据丢失量,而 RTO 代表您可以承受的服务最大停机时间。In a few words, RPO represents the maximum amount of data you can afford to lose, while RTO represents the maximum down-time you can afford for your service.
可以理解,我们都希望 RPO=0(“零数据丢失”)和 RTO=0(零停机时间,乌托邦)——即使这是我们祖母的食谱网站。 实际上,仔细的成本分析阶段可以让您确定业务连续性要求。Understandably, we all want RPO=0 (“zero data loss”) and RTO=0 (zero down-time, utopia) - even if it is our grandmothers’s recipe website. In reality, a careful cost analysis phase allows you to determine your business continuity requirements.
幸运的是,通过 Barman 和 PostgreSQL 组成的开源堆栈,您可以通过同步流复制实现 RPO=0。 RTO 更多地是高可用性解决方案的重点,例如 repmgr。 因此,通过集成 Barman 和 repmgr,您可以将 RTO 显着降低到几乎为零。 根据我们在 EnterpriseDB 的经验,我们可以确认,如果配置和监控得当,使用 Barman 和 repmgr 的 PostgreSQL 开源集群可以轻松实现超过 99.99% 的正常运行时间。Fortunately, with an open source stack composed of Barman and PostgreSQL, you can achieve RPO=0 thanks to synchronous streaming replication. RTO is more the focus of a High Availability solution, like repmgr. Therefore, by integrating Barman and repmgr, you can dramatically reduce RTO to nearly zero. Based on our experience at EnterpriseDB, we can confirm that PostgreSQL open source clusters with Barman and repmgr can easily achieve more than 99.99% uptime over a year, if properly configured and monitored.
无论如何,重要的是我们更多地强调与灾难恢复相关的文化方面,而不是实际的工具。 没有人的工具是没有用的。 我们与 Barman 的使命是促进灾难恢复文化:In any case, it is important for us to emphasise more on cultural aspects related to disaster recovery, rather than the actual tools. Tools without human beings are useless. Our mission with Barman is to promote a culture of disaster recovery that: • focuses on backup procedures 专注于备份程序 • focuses even more on recovery procedures 更加关注恢复程序 • relies on education and training on strong theoretical and practical concepts of PostgreSQL’s crash recovery, backup, Point-In-Time-Recovery, and replication for your team members 依靠对团队成员的 PostgreSQL 崩溃恢复、备份、时间点恢复和复制的强大理论和实践概念的教育和培训 • promotes testing your backups (only a backup that is tested can be considered to be valid), either manually or automatically (be creative with Barman’s hook scripts!) 促进手动或自动测试您的备份(只有经过测试的备份才被认为是有效的)(使用 Barman 的钩子脚本发挥创意!) • fosters regular practice of recovery procedures, by all members of your devops team (yes, developers too, not just system administrators and DBAs) 促进您的 devops 团队的所有成员(是的,开发人员也是如此,不仅仅是系统管理员和 DBA)定期练习恢复程序 • solicites to regularly scheduled drills and disaster recovery simulations with the team every 3-6 months 要求团队每 3-6 个月定期安排演习和灾难恢复模拟 • relies on continuous monitoring of PostgreSQL and Barman, and that is able to promptly identify any anomalies 依赖于对 PostgreSQL 和 Barman 的持续监控,能够及时识别任何异常
Moreover, do everything you can to prepare yourself and your team for when the disaster happens (yes, when), because when it happens: 此外,尽一切可能让您和您的团队为灾难发生(是的,何时)做好准备,因为当它发生时: • It is going to be a Friday evening, most likely right when you are about to leave the office. • It is going to be when you are on holiday (right in the middle of your cruise around the world) and somebody else has to deal with it. • It is certainly going to be stressful. • You will regret not being sure that the last available backup is valid. • Unless you know how long it approximately takes to recover, every second will seems like forever. • 这将是一个星期五晚上,很可能就在你即将离开办公室的时候。 • 这将是当你在度假时(就在你环游世界的途中),而其他人必须处理它。 • 肯定会有压力。 • 您会后悔不能确定最后的可用备份是否有效。 • 除非您知道大约需要多长时间才能恢复,否则每一秒都像是永远。
Be prepared, don’t be scared. In 2011, with these goals in mind, 2ndQuadrant started the development of Barman, now one of the most used backup tools for PostgreSQL. Barman is an acronym for “Backup and Recovery Manager”. Currently, Barman works only on Linux and Unix operating systems.
前面博客分享过FTS GP数据库高可用进程系列文章,主要在整体上梳理一下FTS工作原理和流程。由于FTS进程设计本身固有的缺陷,所以后续博客也关注了Postgres高可用组件Patroni工具的使用和原理。后续在现场发生了多次增量全量修复失败的案例,以及修复时间过长IO过高的限制。 高可用系列将关注Greeplum数据库的修复工具gprecoverseg增量(pg_rewind)修复和全量(pg_basebackup)修复,以及Barman、pg_probackup、pgbackrest等工具的原理,以期将优秀的设计引入到GP数据库的修复流程中,降低被业务线diss的几率。
https://github.com/EnterpriseDB/barman
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~