Postgres-XL数据库GTM——GTM and Global Transaction Management

网友投稿 242 2022-12-02

Postgres-XL数据库GTM——GTM and Global Transaction Management

Review of PostgreSQL Transaction Management Internals

在 PostgreSQL 中,每个事务都有唯一的 ID,称为事务 ID(或 XID)。 XID 以升序给出,以区分哪个事务较旧/较新。 [20] 当一个事务试图读取一个元组时,[21] 每个元组都有一组 XID 来指示创建和删除元组的事务。 因此,如果目标元组是由活动事务创建的,它不会被提交或中止,事务应该忽略这样的元组。 以这种方式(实际上,这是由 PostgreSQL 核心中的 versup 模块完成的),如果我们在整个系统中为每个事务提供一个唯一的事务 ID 并维护快照哪个事务是活动的,不仅在单个服务器中,而且在所有服务器中的事务 ,我们可以保持每个元组的全局一致可见性,即使一个服务器接受来自另一个服务器上运行的其他事务的新语句。In PostgreSQL, each transaction is given unique ID called transaction ID (or XID). XID is given in ascending order to distinguish which transaction is older/newer. [20] When a transaction tries to read a tuple, [21] each tuple has a set of XIDs to indicate transactions which created and deleted the tuple. So if the target tuple is created by an active transaction, it is not committed or aborted and the transaction should ignore such tuple. In such way (in practice, this is done by versup module in PostgreSQL core), if we give each transaction a unique transaction Id throughout the system and maintain snapshot what transaction is active, not only in a single server but transaction in all the servers, we can maintain global consistent visibility of each tuple even when a server accepts new statement from other transactions running on the other server.

这些信息存储在表格每一行的“xmin”和“xmax”字段中。 当我们插入行时,插入事务的 XID 记录在 xmin 字段中。 当我们更新表的行时(使用 UPDATE 或 DELETE 语句),PostgreSQL 不会简单地覆盖旧行。 相反,PostgreSQL 通过将更新事务的 XID 写入 xmax 字段,将旧行“标记”为“已删除”。 在 UPDATE 的情况下(就像 INSERT 一样),创建新行,其 xmin 字段用创建事务的 XID“标记”。These information is stored in “xmin” and “xmax” fields of each row of table. When we INSERT rows, XID of inserting transaction is recorded at xmin field. When we update rows of tables (with UPDATE or DELETE statement), PostgreSQL does not simply overwrite the old rows. Instead, PostgreSQL “marks” the old rows as “deleted” by writing updating transaction’s XID to xmax field. In the case of UPDATE (just like INSERT), new rows are created whose xmin field is “marked” with XIDs of the creating transaction.

这些“xmin”和“xmax”用于确定哪一行对事务可见。 为此,PostgreSQL 需要一个数据来指示正在运行的事务,这称为“快照”。These “xmin” and “xmax” are used to determine which row is visible to a transaction. To do this, PostgreSQL needs a data to indicate what transactions are running, which is called the “snapshot”.

如果创建事务没有运行,每一行的可见性取决于创建事务是提交还是中止的事实。 假设由某个事务创建但尚未删除的表中的一行。 如果创建事务正在运行,则该行对创建该行的事务可见,但对其他事务不可见。 如果创建事务未运行且已提交,则该行可见。 如果事务被中止,则该行不可见。If the creating transaction is not running, visibility of each row depends upon the fact if the creating transaction was committed or aborted. Suppose a row of a table which was created by some transaction and is not deleted yet. If the creating transaction is running, such row is visible to the transaction which created the row, but not visible to other transactions. If the creating transaction is not running and was committed the row is visible. If the transaction was aborted, this row is not visible.

因此,PostgreSQL 需要两种信息来确定“哪个事务正在运行”和“一个旧事务是否已提交或中止”。Therefore, PostgreSQL needs two kinds of information to determine “which transaction is running” and “if an old transaction was committed or aborted.”

前者信息作为“快照”获得。 PostgreSQL 将后面的信息维护为“CLOG”。PostgreSQL 使用所有这些信息来确定哪一行对给定事务可见。The former information is obtained as “snapshot.” PostgreSQL maintains the latter information as “CLOG.” PostgreSQL uses all these information to determine which row is visible to a given transaction.

Making Transaction Management Global

在 Postgres-XL 中,以下事务管理和可见性检查功能从节点中提取出来并拉入 GTM。In Postgres-XL, the following features of transaction management and visibility checking extracted out from the nodes and pulled into the GTM.

将 XID 全局分配给事务(GXID,全局事务 ID)。 这可以全局完成以识别系统中的每个事务。Assigning XID globally to transactions (GXID, Global Transaction ID). This can be done globally to identify each Transactions in the system.提供快照。 GTM 收集所有事务的状态(正在运行、已提交、已中止等)以提供全局快照(全局快照)。 请注意,每个全局快照都包含由其他协调器或数据节点发起的 GXID。 这是必需的,因为一些较旧的事务可能会在一段时间后访问新服务器。 在这种情况下,如果快照中不包含此类事务的 GXID,则该事务可能被视为“足够老”,可能会读取未提交的行。 如果此类交易的 GXID 从一开始就包含在快照中,则不会发生这种不一致。Providing snapshots. GTM collects all the transaction’s status (running, committed, aborted etc.) to provide snapshots globally (global snapshot). Please note that each global snapshot includes GXID initiated by other Coordinators or Datanodes. This is needed because some older transaction may visit new server after a while. In this case, if GXID of such a transaction is not included in the snapshot, this transaction may be regarded as “old enough” and uncommitted rows may be read. If GXID of such transaction is included in the snapshot from the beginning, such inconsistency does not take place.

为此,Postgres-XL 引入了一个名为 GTM(全局事务管理器)的专用组件。 GTM 在其中一台服务器上运行,并为在 Postgres-XL 服务器上运行的每个事务提供唯一且有序的事务 ID。 因为这是一个全球唯一的ID,我们称之为GXID(Global Transaction Id)。To do this, Postgres-XL introduced a dedicated component called GTM (Global Transaction Manager). GTM runs on one of the servers and provides unique and ordered transaction id to each transaction running on Postgres-XL servers. Because this is a globally unique ID, we call this GXID (Global Transaction Id).

GTM 从交易中接收 GXID 请求并提供 GXID。它还在开始和结束时跟踪所有事务,以生成用于控制每个元组可见性的快照。因为这里的快照也是全局属性,所以称为全局快照。GTM receives GXID request from transactions and provide GXID. It also keeps track of all the transactions when it started and finished to generate snapshots used to control each tuple visibility. Because snapshots here is also a global property, it is called Global Snapshot.

只要每个事务都使用 GXID 和全局快照运行,它就可以在整个系统中保持一致的可见性,并且可以安全地在任何服务器中并行运行事务。另一方面,由多个语句组成的事务可以使用多个服务器来执行,以保持数据库的一致性。As long as each transaction runs with a GXID and a Global Snapshot, it can maintain consistent visibility throughout the system and it is safe to run transactions in parallel in any servers. On the other hand, a transaction, composed of multiple statements, can be executed using multiple servers maintaining database consistency.

GTM 为每个事务提供 Global Transaction Id 并跟踪所有事务的状态,无论是正在运行、已提交还是已中止,以计算全局快照以保持元组可见性。GTM provides Global Transaction Id to each transaction and keeps track of the status of all the transactions, whether it is running, committed or aborted, to calculate global snapshots to maintain tuple visibility.

为此,每个事务都报告它何时开始和结束,以及何时在两阶段提交协议中发出 PREPARE 命令。For this purpose, each transaction reports when it starts and ends, as well as when it issues PREPARE command in two-phase commit protocol.

每个事务根据事务隔离级别请求快照,就像在 PostgreSQL 中所做的那样。如果事务隔离级别是“已提交读”,那么事务将为每个语句请求一个快照。如果它是“可序列化的”事务将在事务开始时请求快照并重用它认为事务。Each transaction requests snapshots according to the transaction isolation level as done in PostgreSQL. If the transaction isolation level is “read committed”, then transaction will request a snapshot for each statement. If it is “serializable” transaction will request a snapshot at the beginning of transaction and reuse it thought the transaction.

Improving GTM Performance

因为 GTM 可以看作是“序列化”了所有的事务处理,所以人们可能会认为 GTM 可能是一个性能瓶颈。Because GTM can be regarded as “serializing” all the transaction processing, people may think that GTM can be a performance bottleneck.

事实上,GTM 可以限制整体的可扩展性。 GTM 不应该在广域网等非常慢的网络环境中使用。 GTM 架构旨在与千兆本地网络一起使用。 鼓励使用具有最小延迟的本地千兆网络安装 Postgres-XL,即在 GTM、Coordinator 和 Datanodes 之间的连接中使用尽可能少的交换机。 此外,如果系统中有多个网络端口,请考虑将所有组件放在各自的子网中。In fact, GTM can limit the whole scalability. GTM should not be used in very slow network environment such as wide area network. GTM architecture is intended to be used with Gigabit local network. It is encouraged to install Postgres-XL with a local Gigabit network with minimum latency, that is, use as few switches involved in the connection among GTM, Coordinator and Datanodes. In addition, consider putting all components on their own subnet if you have multiple network ports in the systems.

Primitive GTM Implementation

原始 GTM 实现可以如下完成:

Coordinator 后端提供了一个 GTM 客户端库,用于获取 GXID 和快照并报告事务状态。The Coordinator backend is provided with a GTM client library to obtain GXID and snapshots and to report the transaction status.GTM 打开一个端口来接受来自每个 Coordinator 和 Datanode 后端的连接。 当 GTM 接受一个连接时,它会创建一个线程(GTM Thread)来处理来自连接的 Coordinator 后端对 GTM 的请求。GTM opens a port to accept connections from each Coordinator and Datanode backend. When GTM accepts a connection, it creates a thread (GTM Thread) to handle requests to GTM from the connected Coordinator backend.GTM Thread 接收每个请求,记录它并将 GXID、snapshot 和其他响应发送到 Coordinator 后端。GTM Thread receives each request, records it and sends GXID, snapshot and other response to the Coordinator backend.它们会重复,直到协调器后端请求断开连接。They are repeated until the Coordinator backend requests disconnect.

GTM Proxy Implementation

每个事务都频繁地向 GTM 发出请求。 我们可以将它们收集到每个 Coordinator 中的单个请求块中,以通过使用 GTM-Proxy 来减少交互量。Each transaction is issuing requests to GTM frequently. We can collect them into single block of requests in each Coordinator to reduce the amount of interaction by using a GTM-Proxy.

在此配置中,每个 Coordinator 和 Datanode 后端都不会直接连接到 GTM。 相反,我们在 GTM 和 Coordinator 后端之间使用 GTM 代理来对多个请求和响应进行分组。 GTM 代理,就像前面章节中解释的 GTM 一样,接受来自 Coordinator 后端的连接。 但是,它不会创建新线程。 以下段落解释了 GTM 代理如何初始化以及它如何处理来自 Coordinator 后端的请求。In this configuration, each Coordinator and Datanode backend does not connect to GTM directly. Instead, we have GTM Proxy between GTM and Coordinator backend to group multiple requests and responses. GTM Proxy, like GTM explained in the previous sections, accepts connections from the Coordinator backend. However, it does not create new thread. The following paragraphs explains how GTM Proxy is initialized and how it handles requests from Coordinator backends.

GTM Proxy 和 GTM 一样,初始化如下:GTM Proxy, as well as GTM, is initialized as follows:

GTM 正常启动,但现在可以接受来自 GTM 代理的连接。GTM starts up normally, but now can accept connections from GTM proxies.GTM 代理启动。 GTM 代理创建 GTM 代理线程。 每个 GTM Proxy Thread 都提前连接到 GTM。 GTM 代理线程的数量可以在启动时指定。 典型的线程数是一两个,因此可以节省 GTM 和协调器之间的连接数。GTM Proxy starts up. GTM Proxy creates GTM Proxy Threads. Each GTM Proxy Thread connects to the GTM in advance. The number of GTM Proxy Threads can be specified at the startup. A typical number of threads is one or two so it can save the number of connections between GTM and Coordinators.GTM 主线程等待来自每个后端的请求连接。GTM Main Thread waits for the request connection from each backend.

当每个 Coordinator 后端请求连接时,Proxy Main Thread 会分配一个 GTM Proxy Thread 来处理请求。 因此,一个 GTM 代理线程处理多个 Coordinator 后端。 如果一个 Coordinator 有 100 个 Coordinator 后端和一个 GTM 代理线程,则该线程负责 100 个 Coordinator 后端。When each Coordinator backend requests for connection, the Proxy Main Thread assigns a GTM Proxy Thread to handle request. Therefore, one GTM Proxy Thread handles multiple Coordinator backends. If a Coordinator has one hundred Coordinator backends and one GTM Proxy Thread, this thread takes care of one hundred Coordinator backend.

然后 GTM 代理线程扫描来自 Coordinator 后端的所有请求。 如果 Coordinator 很忙,它预计会在一次扫描中捕获更多请求。 因此,proxy 可以将许多请求组合成一个请求块,以减少 GTM 和 Coordinator 之间的交互次数。Then GTM Proxy Thread scans all the requests from Coordinator backend. If Coordinator is busy, it is expected to capture more requests in a single scan. Therefore, the proxy can group many requests into single block of requests, to reduce the number of interaction between GTM and the Coordinator.

此外,在一次扫描中,我们可能有多个快照请求。 因为这些请求可以被认为是同时收到的,所以我们可以用一个来表示多个快照。 这将减少 GTM 提供的数据量。Furthermore, in a single scan, we may have multiple request for snapshots. Because these requests can be regarded as received at the same time, we can represent multiple snapshots with single one. This will reduce the amount of data which GTM provides.

Coordinator

Coordinator 处理来自应用程序的 SQL 语句并确定应该涉及哪个 Datanode 并为每个 Datanode 生成本地 SQL 语句。 在最简单的情况下,如果涉及单个 Datanode,则 Coordinator 只需将传入的语句代理到 Datanode。 在更复杂的情况下,例如,如果无法确定目标 Datanode,则 Coordinator 为每个 Datanode 生成本地语句,收集结果在 Coordinator 处实现以供进一步处理。 在这种情况下,协调员将尝试通过以下方式优化计划 Coordinator handles SQL statements from applications and determines which Datanode should be involved and generates local SQL statements for each Datanode. In the most simplest case, if a single Datanode is involved, the Coordinator simply proxies incoming statements to the Datanode. In more complicated cases, for example, if the target Datanode cannot be determined, then the Coordinator generates local statements for each Datanode, collects the result to materialize at the Coordinator for further handling. In this case, the Coordinator will try to optimize the plan by

Pushdown WHERE clause to Datanodes, 将 WHERE 子句下推到 Datanodes,Pushdown joins to Datanodes, 下推连接到 Datanodes,Pushdown projection (column list in SELECT clause), 下推投影(SELECT 子句中的列列表),Pushdown ORDER BY clause, as well as other clauses. Pushdown ORDER BY 子句,以及其他子句。

如果一个事务涉及多个 Datanode 和/或 Coordinator,Coordinator 将在内部使用两阶段提交协议处理该事务。If a transaction is involved by more than one Datanodes and/or Coordinators, the Coordinator will handle the transaction with two-phase commit protocol internally.

在聚合函数的情况下,Postgres-XL 在现有的转换函数和终结函数之间引入了新的函数集合函数。收集功能在 Coordinator 上运行,以收集涉及的 Datanode 的所有中间结果。有关详细信息,请参阅第 37.10 节和 CREATE AGGREGATE。In the case of aggregate functions, Postgres-XL introduced new function collection function between existing transition function and finalize function. Collection function runs on the Coordinator to collect all the intermediate results from involved Datanodes. For details, see Section 37.10 and CREATE AGGREGATE.

在读取复制表的情况下,Coordinator 可以选择任意一个 Datanode 进行读取。最有效的方法是选择一个在同一硬件或虚拟机上运行。这称为首选数据节点,可以由每个协调器本地的 GUC 指定。 In the case of reading replicated tables, the Coordinator can choose any Datanode to read. The most efficient way is to select one running in the same hardware or virtual machine. This is called preferred Datanode and can be specified by a GUC local to each Coordinator.

另一方面,在写复制表的情况下,所有的 Coordinator 选择相同的 Datanode 开始以避免更新冲突。这称为主数据节点。On the other hand, in the case of writing replicated tables, all the Coordinators choose the same Datanode to begin with to avoid update conflicts. This is called primary Datanode.

协调器还负责处理 DDL 语句。因为 DDL 语句处理系统目录,这些目录在所有协调器和数据节点中复制,所以它们被代理到所有协调器和数据节点。为了在所有节点中同步目录更新,Coordinator 在内部使用两阶段提交协议处理 DDL。Coordinators also take care of DDL statements. Because DDL statements handles system catalogs, which are replicated in all the Coordinators and Datanodes, they are proxied to all the Coordinators and Datanodes. To synchronize the catalog update in all the nodes, the Coordinator handles DDL with two-phase commit protocol internally.

Datanode

While Coordinators handle cluster-wide SQL statements, Datanodes take care of just local issues. In this sense, Datanodes are essentially PostgreSQL servers except that transaction management information is obtained from GTM, as well as other global value. 协调器处理集群范围的 SQL 语句,而数据节点只处理本地问题。 从这个意义上说,Datanodes 本质上是 PostgreSQL 服务器,只是事务管理信息是从 GTM 获取的,还有其他全局值。

Coordinator And Datanode Connection

协调器和数据节点之间的连接数量可能会不时增加。这可能会留下未使用的连接并浪费系统资源。重复真正的连接和断开需要 Datanode 后端初始化,这会增加延迟并浪费系统资源。The number of connections between Coordinators and Datanodes may increase from time to time. This may leave unused connection and waste system resources. Repeating real connect and disconnect requires Datanode backend initialization which increases latency and also wastes system resources.

例如,在 GTM 的情况下,如果每个 Coordinator 与应用程序有 100 个连接,而我们有 10 个 Coordinator,那么一段时间后,每个 Coordinator 可能会连接到每个数据节点。这意味着每个 Coordinator 后端与 Coordinators 有十个连接,每个 Coordinator 与 Coordinators 有一千 (10 x 10) 个连接。For example, as in the case of GTM, if each Coordinator has one hundred connections to applications and we have ten Coordinators, after a while, each Coordinator may have connection to each data node. It means that each Coordinator backend has ten connections to Coordinators and each Coordinator has one thousand (10 x 10) connections to Coordinators.

因为我们为每个后端的锁和其他控制信息消耗了更多资源,并且在给定时间只有少数这样的连接处于活动状态,所以在 Coordinator 和 Datanode 之间保持这些未使用的连接不是一个好主意。Because we consume much more resources for locks and other control information per backend and only a few of such connection is active at a given time, it is not a good idea to hold such unused connections between Coordinator and Datanode.

为了改善这一点,Postgres-XL 在 Coordinator 和 Datanode 之间配备了连接池。当 Coordinator 后端需要连接到 Datanode 时,池化器会从池中寻找合适的连接。如果有可用的,pooler 将其分配给 Coordinator 后端。当不再需要连接时,Coordinator 后端会将连接返回给 pooler。 pooler 不会断开连接。它保持与池的连接以供以后重用,保持 Datanode 后端运行。 To improve this, Postgres-XL is equipped with connection pooler between Coordinator and Datanode. When a Coordinator backend requires connection to a Datanode, the pooler looks for appropriate connection from the pool. If there’s an available one, the pooler assigns it to the Coordinator backend. When the connection is no longer needed, the Coordinator backend returns the connection to the pooler. The pooler does not disconnect the connection. It keeps the connection to the pool for later reuse, keeping Datanode backend running.

[20] More precisely, XID is 32bit integer. When XID reaches the max value, it wraps around to the lowest value (3, as to the latest definition). PostgreSQL has a means to handle this, as well as Postgres-XL. For simplicity, it will not be described in this document. [21] This description is somewhat simplified for explanation. You will find the precise rule in tqual.c file in PostgreSQL’s source code.

​​https://postgres-xl.org/documentation/xc-overview-gtm.html​​

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:Java业务中台确保数据一致性的解决方案
下一篇:ETCD数据库源码分析——protobuf bingdings
相关文章

 发表评论

暂时没有评论,来抢沙发吧~