4.5. Backup and Recovery

4.5.1. Cluster backup with pg_basebackup
4.5.2. Cluster recovery from a backup using pg_basebackup
4.5.3. Cluster backup with pg_probackup
4.5.4. Cluster restore from a backup with pg_probackup

This section describes basics of backup and recovery in Shardman.

You can use backup command of shardmanctl tool to perform a full binary consistent backup of a Shardman cluster to a directory on the local host and recover command to perform a recovery from this backup.

Also you can use probackup backup command of shardmanctl tool to perform a full binary consistent backup of a Shardman cluster to the backups repository on the local host and probackup restore command to perform a recovery from the any backup from the repository.

The pg_probackup utility for creating consistent full and incremental backups for PostgreSQL was integrated into the shardman-utils. Shardman-utils uses the pg_probackup approach to store backups in a pre-created repository. In addition, the pg_probackup commands archive-get and archive-push are used to deliver WAL logs into the backup repository. Backup and restore modes uses a passwordless ssh connection between the cluster nodes and the backup node.

Shardman cluster configuration parameter enable_csn_snapshot must be set to on. This parameter is necessary for the cluster backup to be consistent. If this option is disabled, you consistent backup is not possible.

For consistent visibility of distributed transactions, the technique of global snapshots based on physical clocks is used (Clock-Si). Similarly, it is possible to get a consistent snapshot for backups, only the time corresponding to the global snapshot must be mapped to a set of LSNs for each node. Such a set of consistent LSN in a cluster is called a Syncpoint. By getting a Syncpoint and taking the LSN for each node in the cluster from it, we can make a backup of each node, which must necessarily contain that LSN. We can also recover to this LSN using the point in time recovery (PITR) mechanism.

The backup and probackup commands use different mechanisms to create backups. The backup command is based on the standard utilities pg_basebackup and pg_receivewal. The probackup command uses the pg_probackup utility and its options to create a cluster backup.

4.5.1. Cluster backup with pg_basebackup

shardmanctl conducts a backup task in several steps. The tool:

  1. Takes necessary locks in etcd to prevent concurrent cluster-wide operations.

  2. Connects to a random replication group and locks Shardman metadata tables to prevent modification of foreign servers during the backup.

  3. Creates replication slots on each replication group to ensure that WAL records are not lost.

  4. Dumps Shardman metadata, stored in etcd, to a json file in the backup directory.

  5. To get backups from each replication group, concurrently runs pg_basebackup using replication slots created.

  6. Creates Syncpoint and uses pg_receivewal to fetch WAL logs generated after finishing base backups until LSN's exctracted from Syncpoint data structure point are reached.

  7. Fixes partial WAL files generated by pg_receivewal and creates the backup description file.

4.5.2. Cluster recovery from a backup using pg_basebackup

You can restore a backup on the same or compatible cluster. By compatible clusters, those that use the same Shardman version and have the same number of replication groups are meant here.

shardmanctl can perform either full restore or metadata-only restore. Metadata-only restore is useful if issues are encountered with the etcd instance, but DBMS data is not corrupted.

During metadata-only restore, shardmanctl restores etcd data from the dump created during the backup.

Important

Restoring metadata to an incompatible cluster can lead to catastrophic consequences, including data loss, since the metadata state can differ from the actual configuration layout. Do not perform metadata-only restore if there were cluster reconfigurations after the backup, such as addition or deletion of nodes, even if the same nodes were added back again.

During a full restore, shardmanctl checks whether the number of replication groups in the target cluster matches the number of replication groups in the backup. This means that you cannot restore on an empty cluster, but need to add as many replication groups as necessary for the total number of them to match that of the cluster from which the backup was taken.

Also you could perfoms restoring only on the single shard using --shard parameter.

shardmanctl conducts full restore in several steps. The tool:

  1. Takes necessary locks in etcd to prevent concurrent cluster-wide operations and tries to assign replication groups in the backup to existing replication groups. If it cannot do this (for example, due to cluster incompatibility), the recovery fails.

  2. Restores part of the etcd metadata: the cluster specification and parts of replication group definitions.

  3. When the correct metadata is in place, runs stolon init in PITR initialization mode with RecoveryTargetName set to the value of Syncpoint LSN from the backup info file. DataRestoreCommand and RestoreCommand are also taken from the backup info file.

  4. Waits for each replication group to recover.

4.5.3. Cluster backup with pg_probackup

This section describes basics of backup and recovery in Shardman with the probackup command.

You can use the probackup backup command of shardmanctl tool to perform a binary backups of a Shardman cluster into the backup repository on the local (backup) host and probackup restore command to perform a recovery from the selected backup. Full and partial (delta) backups are supported.

4.5.3.1. Requirements

To backup and restore Shardman cluster, the following requirements must be met:

  • Shardman cluster configuration parameter enable_csn_snapshot must be on. This parameter is necessary for the cluster backup to be consistent. If this option is disabled, consistent backup is not possible;

  • On the backup host Shardman utilities must be installed into /opt/pgpro/sdm-14/bin;

  • On the backup host and on each cluster node pg_probackup must be installed into /opt/pgpro/sdm-14/bin;

  • On the backup host postgres Linux user and group must be created;

  • Passwordless ssh between backup host and each Shardman cluster node for the postgres Linux user must be configured;

  • Backup folder must be created;

  • Access for the postgres Linux user to the backup folder must be granted;

  • shardmanctl utility must be run under postgres Linux user;

  • init subcommand for the backup repository initialization must be successfully executed on the backup host;

  • archive-command add subcommand for enabling archive_command for each replication group to stream WALs into the initialized repository must be successfully executed on the backup host;

4.5.3.2. Cluster backup process

shardmanctl conducts a backup task in several steps. The tool:

  1. Takes necessary locks in etcd to prevent concurrent cluster-wide operations.

  2. Connects to a random replication group and locks Shardman metadata tables to prevent modification of foreign servers during the backup.

  3. Dumps Shardman metadata, stored in etcd, to a json file in the backup directory.

  4. To get backups from each replication group, concurrently runs pg_probackup using configured archive_command.

  5. Creates Syncpoint and get from Syncpoint data structure LSNs for each replication group. Then pg_probackup arhive-push command used to push WAL logs generated after finishing backup, and WAL file where syncpoint LSNs are present for each replication group.

4.5.4. Cluster restore from a backup with pg_probackup

You can restore a backup on the same or compatible cluster. By compatible clusters, those that use the same Shardman version and have the same number of replication groups are meant here.

shardmanctl can perform either full restore or metadata-only restore. Metadata-only restore is useful if issues are encountered with the etcd instance, but DBMS data is not corrupted.

During metadata-only restore, shardmanctl restores etcd data from the dump created during the backup.

Important

Restoring metadata to an incompatible cluster can lead to catastrophic consequences, including data loss, since the metadata state can differ from the actual configuration layout. Do not perform metadata-only restore if there were cluster reconfigurations after the backup, such as addition or deletion of nodes, even if the same nodes were added back again.

During a full restore, shardmanctl checks whether the number of replication groups in the target cluster matches the number of replication groups in the backup. This means that you cannot restore on an empty cluster, but need to add as many replication groups as necessary for the total number of them to match that of the cluster from which the backup was taken.

Also you could perfoms restoring only on the single shard using --shard parameter.

shardmanctl conducts full restore in several steps. The tool:

  1. Takes necessary locks in etcd to prevent concurrent cluster-wide operations and tries to assign replication groups in the backup to existing replication groups. If it cannot do this (for example, due to cluster incompatibility), the recovery fails.

  2. Restores part of the etcd metadata: the cluster specification and parts of replication group definitions.

  3. When the correct metadata is in place, runs stolon init in PITR initialization mode with RecoveryTargetName set to the value of the Syncpoint LSN from the backup info file. DataRestoreCommand and RestoreCommand are also taken from the backup info file. These commands are generated automatically during the backup phase, it is not recommended to make any corrections into the file containing the Shardman cluster backup description. When restoring a cluster for each replication group, the WAL files containing the final LSN to restore will be requested automatically from the backup repository from the remote backup node via the pg_probackup archive-get command.

  4. Waits for each replication group to recover.

  5. Finally we need to enable archive_command back.