This section describes basics of backup and recovery in Shardman.
You can use
backup
command of shardmanctl
tool to perform a full binary consistent backup of a Shardman
cluster to a directory on the local host and
recover
command to perform a recovery from this backup.
Also you can use
probackup backup
command of shardmanctl
tool to perform a full binary consistent backup of a Shardman
cluster to the backups repository on the local host and
probackup restore
command to perform a recovery from the
any backup from the repository.
The pg_probackup utility for creating consistent full and
incremental backups for PostgreSQL was integrated into the shardman-utils. shardman-utils
uses the pg_probackup approach to store backups in a pre-created
repository. In addition, the pg_probackup commands
archive-get and archive-push are used to deliver
WAL logs into the backup repository. Backup and restore modes uses a passwordless ssh
connection between the cluster nodes and the backup node.
Shardman cluster configuration parameter enable_csn_snapshot must
be set to on. This parameter is necessary for the cluster backup to be
consistent. If this option is disabled, you consistent backup is not possible.
For consistent visibility of distributed transactions, the technique of global snapshots based on physical clocks is used (Clock-Si). Similarly, it is possible to get a consistent snapshot for backups, only the time corresponding to the global snapshot must be mapped to a set of LSNs for each node. Such a set of consistent LSN in a cluster is called a Syncpoint. By getting a Syncpoint and taking the LSN for each node in the cluster from it, we can make a backup of each node, which must necessarily contain that LSN. We can also recover to this LSN using the point in time recovery (PITR) mechanism.
The backup and probackup commands use different
mechanisms to create backups. The backup command is based on the standard utilities
pg_basebackup and pg_receivewal. The probackup command uses the pg_probackup utility and its
options to create a cluster backup.
This section describes basics of backup and recovery in Shardman
with the basebackup command.
To backup and restore Shardman cluster via basebackup command, the following requirements must be met:
Shardman cluster configuration parameter enable_csn_snapshot must be on. This
parameter is necessary for the cluster backup to be consistent. If this
option is disabled, consistent backup is not possible;
On the backup host Shardman utilities must be installed into
/opt/pgpro/sdm-14/bin;
On the backup host and on each cluster node pg_basebackup
must be installed into /opt/pgpro/sdm-14/bin;
On the backup host postgres Linux user and group must
be created;
Passwordless ssh between backup host and each Shardman cluster node for
the
postgres Linux user must be configured;
Backup folder must be created;
Access for the postgres Linux user to the backup
folder must be granted;
shardmanctl
utility must be run under postgres
Linux user;
shardmanctl conducts a backup task in several steps. The tool:
Takes necessary locks in etcd to prevent concurrent cluster-wide operations.
Connects to a random replication group and locks Shardman metadata tables to prevent modification of foreign servers during the backup.
Creates replication slots on each replication group to ensure that WAL records are not lost.
Dumps Shardman metadata, stored in etcd, to a json file in the backup directory.
To get backups from each replication group, concurrently runs pg_basebackup using replication slots created.
Creates Syncpoint and uses pg_receivewal to fetch WAL logs generated after finishing base backups until LSNs extracted from Syncpoint are reached.
Fixes partial WAL files generated by pg_receivewal and creates the backup description file.
You can restore a backup on the same or compatible cluster. By compatible clusters, those that use the same Shardman version and have the same number of replication groups are meant here.
shardmanctl can perform either full restore or metadata-only restore. Metadata-only restore is useful if issues are encountered with the etcd instance, but DBMS data is not corrupted.
During metadata-only restore, shardmanctl restores etcd data from the dump created during the backup.
Restoring metadata to an incompatible cluster can lead to catastrophic consequences, including data loss, since the metadata state can differ from the actual configuration layout. Do not perform metadata-only restore if there were cluster reconfigurations after the backup, such as addition or deletion of nodes, even if the same nodes were added back again.
During a full restore, shardmanctl checks whether the number of replication groups in the target cluster matches the number of replication groups in the backup. This means that you cannot restore on an empty cluster, but need to add as many replication groups as necessary for the total number of them to match that of the cluster from which the backup was taken.
Also you could perfoms restoring only on the single shard using --shard parameter.
shardmanctl conducts full restore in several steps. The tool:
Takes necessary locks in etcd to prevent concurrent cluster-wide operations and tries to assign replication groups in the backup to existing replication groups. If it cannot do this (for example, due to cluster incompatibility), the recovery fails.
Restores part of the etcd metadata: the cluster specification and parts of replication group definitions.
When the correct metadata is in place, runs stolon
init in PITR initialization mode with RecoveryTargetName
set to the value of Syncpoint LSN from the backup info file.
DataRestoreCommand and RestoreCommand are also
taken from the backup info file.
Waits for each replication group to recover.
This section describes basics of backup and recovery in Shardman
with the probackup command.
You can use the
probackup backup
command of
shardmanctl tool to perform a binary backups of a Shardman
cluster into the backup repository on the local (backup) host and
probackup restore
command to perform a recovery
from the selected backup. Full and partial (delta) backups are supported.
To backup and restore Shardman cluster via probackup command, the following requirements must be met:
Shardman cluster configuration parameter enable_csn_snapshot must be on. This
parameter is necessary for the cluster backup to be consistent. If this
option is disabled, consistent backup is not possible;
On the backup host Shardman utilities must be installed into
/opt/pgpro/sdm-14/bin;
On the backup host and on each cluster node pg_probackup
must be installed into /opt/pgpro/sdm-14/bin;
On the backup host postgres Linux user and group must
be created;
Passwordless ssh between backup host and each Shardman cluster node for
the
postgres Linux user must be configured;
Backup folder must be created;
Access for the postgres Linux user to the backup
folder must be granted;
shardmanctl
utility must be run under postgres
Linux user;
init subcommand for the backup repository initialization
must be successfully executed on the backup host;
archive-command add subcommand for enabling
archive_command for each replication group to stream WALs into
the initialized repository must be successfully executed on the backup host;
shardmanctl conducts a backup task in several steps. The tool:
Takes necessary locks in etcd to prevent concurrent cluster-wide operations.
Connects to a random replication group and locks Shardman metadata tables to prevent modification of foreign servers during the backup.
Dumps Shardman metadata, stored in etcd, to a json file in the backup directory.
To get backups from each replication group, concurrently runs
pg_probackup using configured archive_command.
Creates Syncpoint and get from Syncpoint data
structure LSNs for each replication group. Then pg_probackup
arhive-push command used to push WAL logs generated after
finishing backup, and WAL file where syncpoint LSNs are present for each
replication group.
You can restore a backup on the same or compatible cluster. By compatible clusters, those that use the same Shardman version and have the same number of replication groups are meant here.
shardmanctl can perform either full restore or metadata-only restore. Metadata-only restore is useful if issues are encountered with the etcd instance, but DBMS data is not corrupted.
During metadata-only restore, shardmanctl restores etcd data from the dump created during the backup.
Restoring metadata to an incompatible cluster can lead to catastrophic consequences, including data loss, since the metadata state can differ from the actual configuration layout. Do not perform metadata-only restore if there were cluster reconfigurations after the backup, such as addition or deletion of nodes, even if the same nodes were added back again.
During a full restore, shardmanctl checks whether the number of replication groups in the target cluster matches the number of replication groups in the backup. This means that you cannot restore on an empty cluster, but need to add as many replication groups as necessary for the total number of them to match that of the cluster from which the backup was taken.
Also you could perfoms restoring only on the single shard using --shard parameter.
shardmanctl conducts full restore in several steps. The tool:
Takes necessary locks in etcd to prevent concurrent cluster-wide operations and tries to assign replication groups in the backup to existing replication groups. If it cannot do this (for example, due to cluster incompatibility), the recovery fails.
Restores part of the etcd metadata: the cluster specification and parts of replication group definitions.
When the correct metadata is in place, runs stolon
init in PITR initialization mode with RecoveryTargetName
set to the value of the Syncpoint LSN from the backup info
file.
DataRestoreCommand and RestoreCommand are also
taken from the backup info file. These commands are generated automatically
during the backup phase, it is not recommended to make any corrections to the
file containing the Shardman cluster backup description. When restoring a
cluster for each replication group, the WAL files containing the final LSN to
restore will be requested automatically from the backup repository from the
remote backup node via the pg_probackup archive-get
command.
Waits for each replication group to recover.
Finally we need to enable archive_command back.