Table of Contents
Multi-Master Cluster Administration
multimaster is a Postgres Pro Enterprise extension with a set
of patches that turns Postgres Pro Enterprise into a synchronous shared-nothing
cluster to provide Online Transaction Processing (OLTP) scalability for read transactions and high availability with automatic disaster recovery.
As compared to a standard PostgreSQL master-slave cluster, a cluster configured with the multimaster extension offers the following benefits:
Cluster-wide transaction isolation
Synchronous logical replication and DDL replication
Working with temporary tables on each cluster node
Fault tolerance and automatic node recovery
PostgreSQL online upgrades
The multimaster extension replicates your
database to all nodes of the cluster and allows write transactions
on each node. To ensure data consistency in the case of concurrent
updates, multimaster enforces transaction
isolation cluster-wide, using multiversion concurrency control
(MVCC) at the read committed or repeatable read isolation levels. Any write
transaction is synchronously replicated to all nodes, which
increases commit latency for the time required for
synchronization. Read-only transactions and queries are executed
locally, without any measurable overhead.
To ensure high availability and fault tolerance of the cluster,
multimaster uses three-phase commit protocol
and heartbeats for failure discovery. A multi-master cluster of N
nodes can continue working while the majority of the nodes are
alive and reachable by other nodes. To be configured with multimaster, the cluster must include at least two nodes. In most cases, three
cluster nodes are enough to ensure high availability. Since the data on all cluster nodes is the same, you do not typically need more than five cluster nodes.
When a failed node
is reconnected to the cluster, multimaster can automatically
fast-forward the node to the actual state based on the
Write-Ahead Log (WAL) data in the corresponding replication slot. If WAL data is no longer available for the time when the node was excluded from the cluster, you can restore the node using pg_basebackup.
When using multimaster, make sure to take its replication restrictions into account. For details, see Section F.30.1.
To learn more about the multimaster internals, see
Section F.30.2.
The multimaster extension takes care of the database replication in a fully automated way. You can perform write transactions on any node and work with temporary tables on each cluster node simultaneously. However, make sure to take the following replication restrictions into account:
multimaster can only replicate one database
per cluster, which is specified in the multimaster.conn_strings variable. If you try to connect to a different database, multimaster will return a corresponding error message.
The replicated tables must have primary keys or replica identity. Otherwise,
multimaster will not allow replication
because of the logical replication restrictions. Unlogged tables are not replicated, as in the standard PostgreSQL.
You can enable replication
of tables without primary keys by setting the multimaster.ignore_tables_without_pk variable to false. However, take into account that
multimaster does not allow update operations on such tables.
Isolation level. The multimaster extension
supports read committed and repeatable read isolation levels. Serializable isolation level is currently not supported.
Using repeatable read isolation level increases
the probability of serialization failure at commit time. Unlike in the standard PostgreSQL, read committed level can also cause serialization failures on a multi-master cluster.
When performing a write transaction, multimaster blocks the affected objects only on the node on which the transaction is performed. However, since write transactions are allowed on all nodes, other transactions can try to change the same objects on the neighbor nodes at the same time. In this case, the replication of the first transaction can fail because the affected objects on the neighbor nodes are already blocked by another transaction. Similarly, the latter transaction cannot be replicated to the first node. In this case, a distributed deadlock occurs. As a result, one of the transactions is automatically rolled back and needs to be repeated. The application must be ready to retry transactions.
If your typical workload has too many rollbacks, it is recommended to use read committed isolation level. However, the read committed still does not guarantee the absence of deadlocks on a multi-master cluster. If using the read committed level does not help, you can try directing all the write transactions to a single node.
Sequence generation. To avoid conflicts between unique identifiers on different nodes, multimaster modifies the default behavior of sequence generators. For each node, ID generation is started with the node number and is incremented by the number of nodes. For example, in a three-node cluster, 1, 4, and 7 IDs are allocated to the objects written onto the first node, while 2, 5, and 8 IDs are reserved for the second node. If you change the number of nodes in the cluster, the incrementation interval for new IDs is adjusted accordingly. Thus, the generated sequence values are not monotonic.
DDL replication. While multimaster
replicates data on the logical level, DDL is replicated on the
statement level, which results in distributed commits of the same
statement on different nodes. As a result, complex DDL
scenarios, such as stored procedures and temporary tables, may
work differently as compared to the standard PostgreSQL.
Commit latency. In the current implementation of logical
replication, multimaster sends data to subscriber nodes only after the
local commit, so you have to wait for transaction processing twice: first on the local node,
and then on all the other nodes simultaneously. In the case of a heavy-write transaction, this may result in a noticeable delay.
If you have any data that must be present on one of the nodes only, you can exclude a particular table from replication, as follows:
SELECT * FROM mtm.make_table_local('table_name')
Since each server in a multi-master cluster can accept writes, any server can abort a
transaction because of a concurrent update — in the same way as it
happens on a single server between different backends. To ensure
high availability and data consistency on all cluster nodes,
multimaster uses logical replication and the three-phase E3PC commit protocol.
When Postgres Pro Enterprise loads the multimaster shared
library, multimaster sets up a logical
replication producer and consumer for each node, and hooks into
the transaction commit pipeline. The typical data replication
workflow consists of the following phases:
PREPARE phase.
multimaster captures and implicitly
transforms each COMMIT statement to a
PREPARE statement. All the nodes that get
the transaction via the replication protocol (the
cohort nodes) send their vote for approving or
declining the transaction to the arbiter process on the
initiating node. This ensures that all the cohort can accept
the transaction, and no write conflicts occur. For details on
PREPARE transactions support in PostgreSQL,
see the
PREPARE
TRANSACTION topic.
PRECOMMIT phase. If all the cohort nodes approve
the transaction, the arbiter process sends a
PRECOMMIT message to all the cohort nodes
to express an intention to commit the transaction. The cohort
nodes respond to the arbiter with the
PRECOMMITTED message. In case of a failure,
all the nodes can use this information to complete the
transaction using a quorum-based voting procedure.
COMMIT phase. If
PRECOMMIT is successful, the arbiter
commits the transaction to all nodes.
multimaster currently supports the read committed and repeatable read isolation levels only, which can cause unexpected serialization failures in your workload. For details, see Section F.30.1.
If a node crashes or gets disconnected from the cluster between
the PREPARE and COMMIT
phases, the PRECOMMIT phase ensures that the
survived nodes have enough information to complete the prepared
transaction. The PRECOMMITTED messages help
avoid the situation when the crashed node has already committed
or aborted the transaction, but has not notified other nodes
about the transaction status. In a two-phase commit (2PC), such a
transaction would block resources (hold locks) until the recovery
of the crashed node. Otherwise, you could get data inconsistencies
in the database when the failed node is recovered. For example, if
the failed node committed the transaction but the survived node
aborted it.
To complete the transaction, the arbiter must receive a response
from the majority of the nodes. For example, for a cluster of 2N+1 nodes, at least N+1 responses are required. Thus, multimaster ensures that
your cluster is available for reads and writes while the majority
of the nodes are connected, and no data inconsistencies occur in
case of a node or connection failure. For details on the failure
detection mechanism, see
Section F.30.2.2.
Since multimaster allows writes to each node,
it has to wait for responses about transaction acknowledgement
from all the other nodes. Without special actions in case of a
node failure, each commit would have to wait until the failed node
recovery. To deal with such situations,
multimaster periodically sends heartbeats to
check the node state and the connectivity between nodes. When several
heartbeats to the node are lost in a row, this node is kicked out
of the cluster to allow writes to the remaining alive nodes. You
can configure the heartbeat frequency and the response timeout in
the multimaster.heartbeat_send_timeout and
multimaster.heartbeat_recv_timeout parameters,
respectively.
For alive nodes, there is no way to distinguish between a failed
node that stopped serving requests and a network-partitioned node
that can be accessed by database users, but is unreachable for
other nodes. To avoid conflicting writes to nodes in different
network partitions, multimaster only allows
writes to the nodes that see the majority of other nodes.
For example, suppose a five-node multi-master cluster experienced
a network failure that split the network into two isolated
subnets, with two and three cluster nodes. Based on heartbeats
propagation information, multimaster will
continue accepting writes at each node in the bigger partition,
and deny all writes in the smaller one. Thus, a cluster consisting
of 2N+1 nodes can tolerate N node failures and stay alive if any
N+1 nodes are alive and connected to each other.
In case of a partial network split when different nodes have
different connectivity, multimaster finds a
fully connected subset of nodes and switches off other nodes. For
example, in a three-node cluster, if node A can access both B and
C, but node B cannot access node C, multimaster
isolates node C to ensure data consistency on nodes A and B.
Each node maintains a data structure that keeps the information about the state of all
nodes in relation to this node. You can get this data in the
mtm.get_nodes_state() view.
When a failed node connects back to the cluster,
multimaster starts automatic recovery:
The reconnected node selects a random cluster node and starts catching up with the current state of the cluster based on the Write-Ahead Log (WAL).
When the node gets synchronized up to the minimum recovery
lag, all the cluster nodes get locked for write transactions to allow the
recovery process to finish. By default, the minimum recovery
lag is 100KB. You can change this value in the
multimaster.min_recovery_lag variable.
When the recovery is complete, multimaster
promotes the reconnected node to the online state and
includes it into the replication scheme.
Automatic recovery is only possible if the failed node WAL lag
behind the working ones does not exceed the
multimaster.max_recovery_lag value. When the
WAL lag is bigger than
multimaster.max_recovery_lag, you can manually
restore the node from one of the working nodes using
pg_basebackup.
See Also
To use multimaster, you need to install
Postgres Pro Enterprise on all nodes of your cluster. Postgres Pro Enterprise includes all the required dependencies and
extensions.
After installing Postgres Pro Enterprise on all nodes, you need to
configure the cluster with multimaster.
Suppose
you are setting up a cluster of three nodes, with
node1, node2, and
node3 domain names. First, set up the database to be replicated, and make sure you have a user with superuser rights to perform replication:
If you are starting from scratch, initialize a cluster,
create an empty database mydb and a new
user myuser, on each node of the cluster. For details, see Section 17.2.
If you already have a database mydb
running on node1, initialize
new nodes from the working node using
pg_basebackup. On behalf of myuser, run the following command on each node you are going to add:
pg_basebackup -D datadir -h node1 mydb
where datadir is the directory containing the database cluster. This directory is specified at the cluster initialization stage, or set in the PGDATA environment variable.
For details on using pg_basebackup, see pg_basebackup.
Once the database is set up, complete the following steps on each cluster node:
Modify the postgresql.conf configuration
file, as follows:
Add multimaster to the shared_preload_libraries variable:
shared_preload_libraries = 'multimaster'
If the shared_preload_libaries variable is already defined in postgresql.auto.conf, you will need to modify its value using the ALTER SYSTEM command. For details, see Section 18.1.2.
Specify the transaction isolation level for your cluster. multimaster currently supports read committed and repeatable read isolation levels.
default_transaction_isolation = 'read committed'
Using repeatable read isolation level increases
the probability of serialization failure at commit time. If such cases are not handled by your application, you are recommended to use read committed isolation level.
Set up PostgreSQL parameters related to replication.
wal_level = logical max_connections = 100 max_prepared_transactions = 300 max_wal_senders = 10 # at least the number of nodes max_replication_slots = 10 # at least the number of nodes
You must change the replication level to
logical as
multimaster relies on logical
replication. For a cluster of N nodes, enable at least N
WAL sender processes and replication slots. Since
multimaster implicitly adds a
PREPARE phase to each
COMMIT transaction, make sure to set
the number of prepared transactions to N*max_connections.
Otherwise, prepared transactions may be queued.
Make sure you have enough background workers allocated for each node:
max_worker_processes = 250
For example, for a three-node cluster with
max_connections = 100,
multimaster may need up to 206
background workers at peak times: 200 workers for
connections from the neighbor nodes, two workers for
WAL sender processes, two workers for WAL receiver
processes, and two workers for the arbiter sender and
receiver processes. When setting this parameter, remember
that other modules may also use background workers at the
same time.
Add multimaster-specific options:
multimaster.max_nodes = 3 # cluster size
multimaster.node_id = 1 # the 1-based index of this node
# in the cluster
multimaster.conn_strings = 'dbname=mydb user=myuser host=node1, dbname=mydb user=myuser host=node2, dbname=mydb user=myuser host=node3'
# comma-separated list
# of connection strings
# to neighbor nodes
The multimaster.max_nodes variable defines the maximum cluster size. If you plan to add new nodes to your cluster, the multimaster.max_nodes value should exceed the initial number of nodes. In this case, you can add new nodes without restarting Postgres Pro Enterprise until the specified number of nodes is reached.
In most cases, three cluster nodes are enough to ensure high availability. Since the data on all cluster nodes is the same, you do not typically need more than five cluster nodes.
If you would like to change the default connection settings for a cluster node, you can add other connection parameters to the corresponding connection string in the multimaster.conn_strings variable. If you change the default port on which the arbiter process listens for connections, you must specify this port in the arbiter_port parameter. For details, see multimaster.arbiter_port and multimaster.conn_strings.
The
multimaster.node_id variable takes natural
numbers starting from 1, without any gaps in numbering.
For example, for a cluster of five nodes, set node IDs
to 1, 2, 3, 4, and 5. In the
multimaster.conn_strings variable, make sure to
list the nodes in the order of their IDs. The
multimaster.conn_strings variable must be the
same on all nodes.
Depending on your network environment and usage patterns, you
may want to tune other multimaster
parameters. For details, see
Section F.30.3.2.
Modify the pg_hba.conf file to allow replication to each cluster node on behalf of myuser.
Restart PostgreSQL:
pg_ctl -Ddatadir-lpg.logstart
When Postgres Pro Enterprise is started on all nodes, connect to any node
and create the multimaster extension:
psql -h node1 CREATE EXTENSION multimaster;
The CREATE EXTENSION query is replicated to all the cluster nodes.
To ensure that multimaster is enabled, check
the mtm.get_cluster_state() view:
mtm.get_cluster_state();
If liveNodes is equal to
allNodes, your cluster is successfully
configured and ready to use.
See Also
While you can use multimaster in the default
configuration, you may want to tune several parameters for faster
failure detection or more reliable automatic recovery.
To check availability of the neighbor nodes,
multimaster periodically sends heartbeat
packets to all nodes. You can define the timeout for failure detection with the following variables:
The multimaster.heartbeat_send_timeout
variable defines the time interval between the
heartbeats. By default, this variable is set to 1000ms.
The multimaster.heartbeat_recv_timeout
variable sets the timeout for the response. If no heartbeats are
received during this time, the node is assumed to be
disconnected and is excluded from the cluster. By default,
this variable is set to 10000ms.
It's a good idea to set
multimaster.heartbeat_send_timeout based on
typical ping latencies between the nodes. Small recv/send ratio
decreases the time of failure detection, but increases the
probability of false-positive failure detection. When setting
this parameter, take into account the typical packet loss ratio
between your cluster nodes.
If a cluster node fails, multimaster can
automatically restore it based on the WAL collected
on other cluster nodes. To control the recovery settings, use the
following variables:
multimaster.min_recovery_lag — sets the
minimal WAL lag between the node to be restored and the current cluster state.
By default, multimaster.min_recovery_lag is set to 100KB.
When the disconnected node is fast-forwarded up to the
multimaster.min_recovery_lag threshold,
multimaster stops all new commits to the
alive nodes until the node fully catches up with the
current state of the cluster. When the data is fully synchronized,
the disconnected node is promoted to the online state, and
the cluster resumes its work.
multimaster.max_recovery_lag — sets the
maximum size of WAL. Upon reaching the
multimaster.max_recovery_lag threshold,
WAL for the disconnected node is overwritten. At this
point, automatic recovery is no longer possible. In this case, you can restore the node manually by cloning the data from one of the alive nodes using pg_basebackup.
By default, multimaster.max_recovery_lag is
set to 100MB. Setting
multimaster.max_recovery_lag to a larger
value increases the timeframe for automatic recovery, but
requires more disk space for WAL collection.
See Also
multimaster provides several views to check the
current cluster state.
To check node-specific information, use mtm.get_nodes_state():
mtm.get_nodes_state();
To check the status of the whole cluster, use the
mtm.get_cluster_state() view:
mtm.get_cluster_state();
For details on all the returned information, see Section F.30.5.2.
With the multimaster extension, you can add or drop cluster nodes without
stopping the database service.
To add a new node, you need to change the cluster configuration on alive nodes, load all the data to the new node using pg_basebackup, and start the node.
Suppose we have a working cluster of three nodes, with
node1, node2, and
node3 domain names. To add
node4, follow these steps:
Check whether the current number of cluster nodes has reached the value specified in the
multimaster.max_nodes variable. If this value is reached, increase
the multimaster.max_nodes value on each node and restart all nodes.
You can restart the nodes one by one, without stopping the database.
If the maximum number of nodes is not reached, proceed to the next step.
Figure out the required connection string to
access the new node. For example, for the database
mydb, user myuser, and
the new node node4, the connection string
can be "dbname=mydb user=myuser host=node4".
For details, see multimaster.conn_strings.
In psql connected to any alive node, run:
mtm.add_node('dbname=mydb user=myuser host=node4');
This command changes the cluster configuration on all nodes and starts replication slots for the new node.
Connect to the new node and clone all the data from one of the alive nodes to the new node:
pg_basebackup -D datadir -h node1 -x
pg_basebackup copies the entire data
directory from node1, together with
configuration settings.
Update postgresql.conf settings on
node4:
multimaster.node_id = 4
multimaster.conn_strings = 'dbname=mydb user=myuser host=node1, dbname=mydb user=myuser host=node2,
dbname=mydb user=myuser host=node3, dbname=mydb user=myuser host=node4'
Start PostgreSQL on the new node:
pg_ctl -Ddatadir-lpg.logstart
All the cluster nodes get locked for write transactions until the new node retrieves all the updates that happened after you started making a base backup.
When data recovery is complete, multimaster promotes the new node to the online state and includes it into the replication scheme.
To ensure that the new configuration is loaded in the case of PostgreSQL restart, update configuration settings on all the cluster nodes:
Change multimaster.conn_strings to include the new node.
Make sure the pg_hba.conf file allows
replication to the new node.
See Also
To exclude a node from the cluster, use the
mtm.stop_node() function. For example, to
exclude node 3, run the following command on any other node:
SELECT mtm.stop_node(3);
This disables replication slots for node 3 on all cluster nodes and stops replication to this node.
If you simply shutdown a node, it will be excluded
from the cluster as well. However, all transactions in the cluster
will be frozen until other nodes detect the offline state of the node.
This time interval is defined by the multimaster.heartbeat_recv_timeout parameter.
The multimaster extension can automatically restore a failed node if the WAL is available for the time when the node was disconnected from the cluster. However, if the data updates on the alive nodes exceed the allowed WAL size specified in the multimaster.max_recovery_lag variable. In this case, you can manually restore the failed node.
Suppose node2 got disconnected from your three-node cluster and needs to be manually restored. The typical workflow is as follows:
In psql connected to any alive node, create a new replication slot for the disconnected node with the following command:
mtm.recover_node(2);
where 2 is the ID of the disconnected node specified in the multimaster.node_id variable.
Connect to node2 and clone all the data from one of the alive nodes:
pg_basebackup -D datadir -h node1 -x
pg_basebackup copies the entire data
directory from node1, together with
configuration settings.
Start PostgreSQL on the restored node:
pg_ctl -Ddatadir-lpg.logstart
All the cluster nodes get locked for write transactions until the restored node retrieves all the updates that happened after you started making a base backup.
When data recovery is complete, multimaster promotes the new node to the online state and includes it into the replication scheme.
See Also
multimaster.node_id
Node ID — a unique natural number identifying the node of a multi-master cluster. You must start node numbering from 1. There must be no gaps in numbering. For example, for a cluster of five nodes, set node IDs to 1, 2, 3, 4, and 5.
multimaster.conn_strings
Connection strings for each node of a multi-master cluster, separated by commas.
The multimaster.conn_strings parameter must be identical on all nodes.
Each connection string must include the name of the database to replicate
and the cluster node domain name. For example, 'dbname=mydb
host=node1, dbname=mydb host=node2, dbname=mydb host=node3'.
Optionally, you can add other connection parameters to change the default connection settings.
Connection strings must appear in the order of the node IDs
specified in the multimaster.node_id variable.
Connection string for the i-th node must be on the i-th position.
If you specify a custom port in the multimaster.arbiter_port,
you must provide this value in the arbiter_port
parameter in the connection string for the corresponding node.
multimaster.max_nodes
The maximum number of nodes allowed in the cluster. If you plan to add new nodes to your cluster, the multimaster.max_nodes value should exceed the initial number of nodes. In this case, you can add new nodes without restarting Postgres Pro Enterprise until the specified number of nodes is reached. In most cases, three cluster nodes are enough to ensure high availability. Since the data on all cluster nodes is the same, you do not typically need more than five cluster nodes. The maximum possible number of nodes is limited to 64.
Default: the number of nodes specified in the multimaster.conn_strings variable
multimaster.arbiter_port
Port for the arbiter process to listen on. If you change the default value, you must specify this value in the arbiter_port
parameter in the connection string for the corresponding node.
Default: 5433
multimaster.heartbeat_send_timeoutTime interval between heartbeat messages, in milliseconds. An arbiter process broadcasts heartbeat messages to all nodes to detect connection problems.
Default: 1000
multimaster.heartbeat_recv_timeoutTimeout, in milliseconds. If no heartbeat message is received from the node within this timeframe, the node is excluded from the cluster.
Default: 10000
multimaster.min_recovery_lagMinimal WAL lag between the node to be restored and the current cluster state, in bytes. When this threshold is reached during node recovery, the cluster is locked for write transactions until the recovery is complete.
Default: 100000
multimaster.max_recovery_lag
Maximal WAL lag
size, in bytes. When a node is disconnected from the cluster, other
nodes copy WALs for all new transactions into the replication slot of
this node. Upon reaching the
multimaster.max_recovery_lag value, the
replication slot for the disconnected node is dropped to avoid
overflow. At this point, automatic recovery of the node is no longer
possible. In this case, you can restore the node manually by cloning
the data from one of the alive nodes using
pg_basebackup or a similar tool. If you set this
variable to zero, replication slot will not be dropped.
Default: 100000000
multimaster.ignore_tables_without_pk
Boolean.
This variable enables/disables replication of tables without primary
keys. By default, such replication is
disabled because of the logical replication restrictions. To enable
replication of tables without primary keys, you can set this variable to false. However, take into
account that multimaster does not allow update
operations on such tables.
Default: true
multimaster.cluster_name
Name of the cluster. If
you define this variable when setting up the cluster, multimaster checks that
the cluster name is the same for all the cluster nodes.
multimaster.break_connection
Break connection with clients connected to the node if this node disconnects
from the cluster. If this variable is set to false, the client stays
connected to the node but receives an error that the node is in minority.
Default: false
multimaster.major_node
Node with this flag continues working even if it cannot access the majority of other nodes.
This is needed to break the symmetry if there is an even number of alive nodes in the cluster.
For example, a cluster with three nodes continues working if one of the nodes has crashed.
If connection between the remaining nodes is lost, the node with multimaster.major_node = true will continue working.
This parameter should be used with caution. Only one node in the cluster
can have this parameter set to true. When set to true on several
nodes, this parameter can cause the split-brain problem.
multimaster.max_workers
The maximum number of walreceiver workers on this server.
This parameter should be used with caution. If the number of simultaneous transactions in the whole cluster is bigger than the provided value, it can lead to undetected deadlocks.
multimaster.trans_spill_threshold
The maximal size of transaction, in MB. When this threshold is reached, the transaction is written to the disk.
Default: 100
mtm.get_nodes_state()
Shows the status of all nodes in the cluster. Returns a tuple of the following values:
id, integer
Node ID.
enabled, boolean
Shows whether the node is excluded from the cluster. The node can only be disabled if responses to heartbeats are not received within the heartbeat_recv_timeout time interval. When the node starts responding to heartbeats, multimaster can automatically restore the node and switch it back to the enabled state.
Automatic recovery is only possible if the replication slot is still active. Otherwise, you can restore the node manually.
connected, boolean
Shows whether the node is connected to the WAL sender.
slot_active, boolean
Shows whether the node has an active replication slot. For a disabled node, the slot remains active until the max_recovery_lag value is reached.
stopped, boolean
Shows whether replication to this node was stopped by the mtm.stop_node() function. A stopped node acts as a disabled one, but cannot be automatically recovered. Call mtm.recover_node() to re-enable such a node.
catchUp, boolean
During the node recovery, shows whether the data is recovered up to the min_recovery_lag value.
slotLag, bigint
The size of WAL data that the replication slot holds for a disabled/stopped node. The slot is dropped when slotLag reaches the max_recovery_lag value.
avgTransDelay, bigint
An average commit delay caused by this node, in microseconds.
lastStatusChange, timestamp
Last time when the node changed its status (enabled/disabled).
oldestSnapshot, bigint
The oldest global snapshot existing on this node.
SenderPid, integer
Process ID of the WAL sender.
SenderStartTime, timestamp
WAL sender start time.
ReceiverPid, integer
Process ID of the WAL receiver.
ReceiverStartTime, timestamp
WAL receiver start time.
connStr, text
Connection string to this node.
connectivityMask, bigint
Bitmask representing connectivity to neighbor nodes. Each bit represents a connection to node.
nHeartbeats, integer
The number of heartbeat responses received from this node.
mtm.collect_cluster_state()
Collects the data returned by the mtm.get_cluster_state() function from all available nodes. For this function to work, in addition to replication connections, pg_hba.conf must allow ordinary connections to the node with the specified connection string.
mtm.get_cluster_state()
Shows the status of the multimaster extension. Returns a tuple of the following values:
status, text
Node status. Possible values are: Initialization, Offline, Connected, Online, Recovery, Recovered, InMinor, OutOfService.
disabledNodeMask, bigint
Bitmask of disabled nodes.
disconnectedNodeMask, bigint
Bitmask of disconnected nodes.
catchUpNodeMask, bigint
Bitmask of nodes that completed the recovery.
liveNodes, integer
Number of enabled nodes.
allNodes, integer
Number of nodes in the cluster. The majority of alive nodes is calculated based on this parameter.
nActiveQueries, integer
Number of queries being currently processed on this node.
nPendingQueries, integer
Number of queries waiting for execution on this node.
queueSize, bigint
Size of the pending query queue, in bytes.
transCount, bigint
The total number of replicated transactions processed by this node.
timeShift, bigint
Global snapshot shift caused by unsynchronized clocks on nodes, in microseconds.
recoverySlot, integer
The node from which a failed node gets data updates during automatic recovery.
xidHashSize, bigint
Size of xid2state hash.
gidHashSize, bigint
Size of gid2state hash.
oldestXid, bigint
The oldest transaction ID on this node.
configChanges, integer
Number of state changes (enabled/disabled) since the last reboot.
stalledNodeMask, bigint
Bitmask of nodes for which replication slots were dropped.
stoppedNodeMask, bigint
Bitmask of nodes that were stopped by mtm.stop_node().
lastStatusChange, timestamp
Timestamp of the last state change.
mtm.add_node(conn_str text)
Adds a new node to the cluster.
Arguments:
conn_str — connection string for the
new node. For example, for the database
mydb, user myuser,
and the new node node4, the connection
string is
"dbname=mydb user=myuser host=node4".
Type: text
mtm.drop_node(node integer, drop_slot bool default false)
Excludes a node from the cluster.
Arguments:
node — ID of the node to be dropped
that you specified in the
multimaster.node_id variable.
Type:
integer
drop_slot — Optional. Defines whether
the replication slot should be dropped together with the
node. Set this option to true if you do not plan to
restore the node in the future.
Type: boolean
Default: false
mtm.recover_node(node integer)
Creates a replication slot for the node that was previously dropped together with its slot.
Arguments:
node — ID of the node to be restored.
mtm.make_table_local(relation regclass)
Stops replication for the specified table.
Arguments:
relation — The table you would like to
exclude from the replication scheme.
Type: regclass
The multimaster extension currently passes 162
of 166 PostgreSQL regression tests. We are working right now on
providing full compatibility with the standard PostgreSQL.
Postgres Professional, Moscow, Russia.
The replication mechanism is based on logical decoding and an
earlier version of the pglogical extension
provided for community by the 2ndQuadrant team.
The three-phase E3PC commit protocol is based on the following works:
Idit Keidar, Danny Dolev. Increasing the Resilience of Distributed and Replicated Database Systems.
Tim Kempster, Colin Stirling, Peter Thanisch. A More Committed Quorum-Based Three Phase Commit Protocol.