pg_probackup

pg_probackup
Prev	Up	G.1. Client Applications	Home	Next

pg_probackup — manage backup and recovery of a Postgres Pro Enterprise cluster

Synopsis

pg_probackup [option...] init

pg_probackup [option...] backup

pg_probackup [option...] restore [backup_id]

pg_probackup [option...] validate [backup_id]

pg_probackup [option...] show [backup_id]

pg_probackup [option...] delete backup_id

pg_probackup [option...] delwal [backup_id]

pg_probackup [option...] retention {show | purge}

Description

pg_probackup is a utility to manage backup and recovery of Postgres Pro Enterprise database clusters. It works with server version 9.5 or higher.

The utility makes a binary copy of database cluster files, almost like pg_basebackup does. However, pg_probackup provides additional features required for implementing different backup strategies and dealing with large amounts of data:

Single backup catalog for managing backups, including multi-server replication configurations.
Support for parallel backup and restore.
Support for page-level incremental backups.
Consistency checks for database cluster files and backups.

pg_probackup understands the structure of database cluster files and works on page level to store only meaningful parts of data pages in backups and to check data consistency when checksums are enabled. Backups are also checked for correctness to detect possible disk failures.

Backups along with additional meta information are created in a special backup directory. Continuous archiving should be directed to that directory too. Backup directory must be accessible in the file system of database server; owner of Postgres Pro Enterprise process must have full access to contents of this directory. Usual practice is to place backup directory on a separate server, in which case some network file system should be used.

The same backup directory can be used simultaneously by several Postgres Pro Enterprise servers with replication configured between them. Backups can be made from either primary or standby server, and managed in a single backup strategy. To avoid conflicts, pg_probackup does not allow storing backups for different clusters in a single directory.

Options

Options for pg_probackup utility can be specified in command line (such options are shown below starting from either one or two minus signs). If not given in command line, values for some options are derived from environment variables (names of environment variables are in uppercase). Otherwise, values for some options are taken from pg_probackup.conf configuration file located in the backup directory (such option names are in lowercase).

Common Options

-B directory --backup-path=directory: Absolute path to the backup directory. In this directory backups and WAL archive are stored.
-D directory --pgdata=directory PGDATA pgdata: Absolute path to database cluster's data directory.
-j num_threads --threads=num_threads: Number of parallel threads for backup, recovery, and backup validation.
--progress: Shows progress of operations.
-q --quiet: Do not display any messages.
-v --verbose: Print detailed information during processing.
-V --version: Print the pg_probackup version and exit.

Backup Options

-b mode --backup-mode=mode BACKUP_MODE backup_mode: Backup mode. Supported modes are: FULL (full backup), PAGE (incremental backup, tracking changes by scanning WAL files), PTRACK (incremental backup, tracking changes on-the-fly). The last mode is not supported with standard PostgreSQL server.
--stream: Makes an autonomous backup that includes all necessary WAL files, by streaming them from database server via replication protocol.
-S slot_name --slot=slot_name: This option causes the WAL streaming to use the specified replication slot, and is used together with --stream.
-C --smooth-checkpoint SMOOTH_CHECKPOINT smooth_checkpoint: Causes checkpoint to be spread out over a period of time (default is to complete checkpoint as soon as possible).
--backup-pg-log: Includes pg_log directory (where the log messages are stored usually) in the backup. By default this directory is excluded.

Restore and Validate Options

--time=time: Specifies the timestamp up to which recovery will proceed.
--xid=xid: Specifies the transaction ID up to which recovery will proceed.
--inclusive=boolean: Specifies whether to stop just after the specified recovery target (true), or just before the recovery target (false).
--timeline=timeline: Specifies recovering into a particular timeline.
-T OLDDIR=NEWDIR --tablespace-mapping=OLDDIR=NEWDIR: Relocate the tablespace from directory OLDDIR to NEWDIR at the time of recovery. Both OLDDIR and NEWDIR must be absolute paths.

Delete Options

--wal: Delete WAL files that are no longer necessary to restore from any of existing backups.

Retention Options

--redundancy redundancy

Specifies the number of full backup copies to keep in the data directory. Must be a positive integer.

You can use this option with the retention purge command to override the default retention policy set in pg_probackup.conf.

--window window

Defines the recovery point, in the number of days from the current moment. This is the earliest point in time for which pg_probackup can complete the recovery. For example, if window=7, pg_probackup must keep at least one full backup copy that is older than seven days, with all the corresponding WAL files.

You can use this option with the retention purge command to override the default retention policy set in pg_probackup.conf.

Connection parameters

pg_probackup also accepts the following command-line arguments for connection parameters:

-d dbname --dbname=dbname PGDATABASE

Specifies the name of the database to connect to. The connection is used only for managing backup process, so you can connect to any existing database.

-h host --host=host

Specifies the host name of the machine on which the server is running. If the value begins with a slash, it is used as the directory for the Unix domain socket.

-p port --port=port

Specifies the TCP port or local Unix domain socket file extension on which the server is listening for connections.

-U username --username=username

User name to connect as.

-w --no-password

Never issue a password prompt. If the server requires password authentication and a password is not available by other means such as a .pgpass file, the connection attempt will fail. This option can be useful in batch jobs and scripts where no user is present to enter a password.

-W --password

Force pg_probackup to prompt for a password before connecting to a database.

This option is never essential, since pg_probackup will automatically prompt for a password if the server demands password authentication. However, pg_probackup will waste a connection attempt finding out that the server wants a password. In some cases it is worth typing -W to avoid the extra connection attempt.

Exit Status

On success pg_probackup returns 0 to the shell. Other status codes indicate an error: 1 — generic error, 2 — repeated error, and 3 — unexpected error.

Usage

Initial Setup

In any usage scenario, first of all Postgres Pro Enterprise server should be configured and backup catalog should be initialized.

pg_probackup initial setup, as well as further work with the utility, is performed by Postgres Pro Enterprise process owner (usually postgres).

A connection to database server is required for pg_probackup to take backups. Database user, which pg_probackup is connected as, must have sufficient privileges to execute some administrative functions. The user must also have REPLICATION attribute in order to make autonomous backups. pg_probackup can be connected as a superuser, but it is advisable to create a separate user with the following minimum required privileges:

CREATE ROLE backup WITH LOGIN REPLICATION;
GRANT USAGE ON SCHEMA pg_catalog TO backup;
GRANT EXECUTE ON FUNCTION current_setting(text) TO backup;
GRANT EXECUTE ON FUNCTION pg_is_in_recovery() TO backup;
GRANT EXECUTE ON FUNCTION pg_start_backup(text, boolean, boolean) TO backup;
GRANT EXECUTE ON FUNCTION pg_stop_backup() TO backup;
GRANT EXECUTE ON FUNCTION pg_stop_backup(boolean) TO backup;
GRANT EXECUTE ON FUNCTION pg_switch_xlog() TO backup;
GRANT EXECUTE ON FUNCTION txid_current() TO backup;
GRANT EXECUTE ON FUNCTION txid_current_snapshot() TO backup;
GRANT EXECUTE ON FUNCTION txid_snapshot_xmax(txid_snapshot) TO backup;

When using Postgres Pro server, additional privileges are required for taking incremental backups:

GRANT EXECUTE ON FUNCTION pg_ptrack_clear() TO backup;
GRANT EXECUTE ON FUNCTION pg_ptrack_get_and_clear(oid, oid) TO backup;

Postgres Pro Enterprise server configuration must accept connections for the user in pg_hba.conf. For autonomous backups, replication connections must also be accepted, and max_wal_senders value should be high enough to allow pg_probackup to connect for streaming WAL files during backup.

The wal_level parameter must be replica of higher (archive for versions below 9.5).

To initialize backup directory, execute the following command:

pg_probackup init -B backup_directory -D data_directory

The -B option specifies the directory where backups and meta information will be stored. As this option is required for all pg_probackup commands, it makes sense to specify it once in the BACKUP_PATH environment variable.

The -D option specifies the database cluster's data directory. It is handy to set it in the PGDATA environment variable instead of specifying it every time in command line. In the subsequent examples these options are omitted.

The utility creates the specified directory and all the necessary files and subdirectories in it:

pg_probackup.conf — configuration file with default values for some of the options. The full list of options is provided in the section called “Options”.
wal/ — directory for WAL files.
backups/ — directory for backups. The utility will create separate subdirectories for each backup it makes, named by the backup identifier.

The backup directory can be created beforehand, but in this case it must be empty.

Autonomous Backups

Autonomous backups offer the simplest way to make a backup without the need to configure Postgres Pro Enterprise for continuous archiving. Such backups contain database cluster files as well as WAL files necessary for recovery.

Without a WAL files archive, database cluster can be restored from an autonomous backup only to its state at the moment the backup was taken.

To make an autonomous backup, execute the following command:

pg_probackup backup -b full --stream

Additionally this command should be supplied with connection options. These options are specified exactly the same way as for other Postgres Pro Enterprise utilities: either by command line options (-h/--host, -p/--port, -U/--username, -d/--dbname) or by environment variables (PGHOST, PGPORT, PGUSER, PGDATABASE). If nothing is given, the default values are taken (local connection, both database user name and database name are the same as operating user name). Any database in the cluster can be specified to connect to.

To view the existing backups, run the command:

pg_probackup show

The following information is given:

ID — the backup identifier. It is used for pointing to a specific backup in many commands.
Recovery time — the earliest moment for which you can restore the state of the database cluster.
Mode — the method used to take this backup (FULL+STREAM — autonomous backup; other modes are described below: FULL, PAGE, PTRACK).
Current/Parent TLI — current and parent timelines of the database cluster.
Time — time it took to peform the backup.
Data — size of data in this backup.
Status — state of the backup (OK — the backup is created and ready for use, ERROR — an error happened while the backup was being taken, CORRUPT — the backup is corrupted and cannot be used).

To get more detailed information about the backup, specify its identifier in show command:

pg_probackup show backup_id

To make sure a backup is correctly written to disk, pg_probackup automatically checks its checksums immediately after the backup was taken. A backup can be explicitly revalidated by running the following command:

pg_probackup validate backup_id

For details, see the section called “Checking Cluster and Backup Consistency” and the section called “Validating Backups”.

To restore the database cluster from the backup, first stop the Postgres Pro Enterprise service (if it is still running) and then execute the following command:

pg_probackup restore backup_id

Then start the database service. During startup, Postgres Pro Enterprise will recover a self-consistent state by replaying WAL files and will be ready to accept connections.

Note that restoring from a backup can be performed exclusively by pg_probackup utility. Inside backup_directory/backups/backup_id/database/ directory you can find the files corresponding to those in the cluster's data directory. Nevertheless, you cannot copy these files directly into the data directory as pg_probackup always stores them compressed to save disk space.

Continuous Archiving and Full Backups

Continuous archiving allows to restore database cluster's state not only at the moment backup was taken, but at arbitrary point in time. In most cases pg_probackup is used along with continuous archiving.

Note that autonomous backups can still be useful together with continuous archiving:

Autonomous backup can be restored on the server that for some reasons has no file access to WAL archive;
To avoid running out of disk space in WAL archive, it should be periodically cleaned up. An autonomous backup allows to restore cluster's state at some point in time, for which WAL files are no longer available. (However one should prefer logical backups made by pg_dumpall for long-term storage, as it is possible that major release of Postgres Pro Enterprise will change during that period.)

To enable continuous archiving on Postgres Pro Enterprise server, configure the following parameters:

archive_mode to 'on';
archive_command to 'test ! -f backup_directory/wal/%f && cp %p backup_directory/wal/%f';

Utilities like rsync that copy WAL files over network, are not currently supported; files must be accessible in server's file system. To access files from a remote server, a network file system can be used.

To take a backup, execute the following command (specify additional connection options if needed):

pg_probackup backup -b full

The backup will only contain database cluster's files. WAL files necessary for recovery will be read from archive in backup_directory/wal/.

To restore the cluster from a backup, make sure that the database service is stopped and run the following command:

pg_probackup restore

The database cluster will be restored from the latest backup available, and recovery.conf file will be created to access the archived WAL files. When started, Postgres Pro Enterprise server will automatically recover database cluster's state using all available WAL files in the archive.

To restore the cluster's state at some arbitrary point in time, the following options (which correspond to recovery options in recovery.conf) can be added:

--timeline specifies recovering into a particular timeline;
one of --time or --xid options specifies recovery target (either point in time or transaction id) up to which recovery will proceed.
--inclusive specifies whether to stop just after the specified recovery target, or just before it.

pg_probackup automatically chooses the backup that is the closest to the specified recovery target, and starts the recovery.

All the described commands can operate autonomous backups the same way as full ones, using WAL files either from the backup itself or from the archive.

A backup identifier can be specified right after the restore command to restore database cluster's state at the moment shown in 'Recovery time' attribute of that backup:

pg_probackup restore backup_id

For autonomous backups WAL archive will not be used. Full backups will use WAL archive only to recover to self-consistent state.

When both backup identifier and one of --time or --xid options are specified for restore command, recovery will start from the specified backup and will proceed up to the specified recovery target. Usually there is no need in such mode, as backup identifier can be omitted to allow pg_probackup to choose it automatically.

Incremental Backups

In addition to full backups pg_probackup allows to take incremental backups, containing only the pages that have changed since the previous backup was taken. This way backups are smaller and may take less time to complete.

There are two modes for incremental backups: to track changes by scanning WAL files (PAGE), and to track changes on-the-fly (PTRACK).

In the first mode pg_probackup scans all WAL files in archive starting from the moment the previous backup (either full or incremental) was taken. Newly created backup will contain only the pages that were mentioned in WAL records.

This way of operation requires all the WAL files since the previous backup to be present in the archive. In case the total size of these files is comparable to total size of database cluster's files, there will be no speedup (but still backup can be smaller by size).

pg_probackup backup -b page

The second mode (tracking changes on-the-fly) requires Postgres Pro server and will not work with the standard PostgreSQL; continuous archiving is not necessary for it to operate.

When ptrack_enable parameter is on, Postgres Pro server tracks changes in data pages. Each time a WAL record for some relation's page is constructed, this page is marked in a special ptrack fork for this relation. As one page requires just one bit in the fork, the fork is quite small but significantly speeds up the process of taking a backup. Tracking implies some minor overhead for the database server.

While taking a backup (either full or incremental), pg_probackup clears ptrack fork of relations being processed. This ensures that the next incremental backup will contain only pages that have changed since the previous backup.

pg_probackup backup -b ptrack

If a backup resulted in an error (for example, it was interrupted), some of relations probably have their ptrack forks already cleared. In this case next incremental backup will contain just part of all changes, which is useless. The same is true when ptrack_enable parameter was turned on after the full backup was taken or when it was turned off for some time. Currently, pg_probackup does not verify that all changes for the increment were actually tracked. Fresh full backup should be taken before incremental ones in such circumstances.

To restore the database cluster from an incremental backup, pg_probackup first restores the full backup and then sequentially applies all the necessary increments. This is done automatically; restoration is managed exactly the same way as for full backups.

Incremental backup can be made autonomous by specifying --stream command line option. Such backup is autonomous only in regard to WAL archive: full backup and previous incremental backups are still needed to restore the cluster.

Configuring Backup Retention Policy

By default, all backup copies created with pg_probackup are stored in the specified backup directory. To save disk space, you can configure retention policy and set up your system to periodically clean up redundant backup copies accordingly.

To configure retention policy, set one or more of the following variables in the pg_probackup.conf file:

redundancy — specifies the number of full backup copies to keep in the data directory.
window — defines the earliest point in time for which pg_probackup can complete the recovery. This option is set in the number of days from the current moment. For example, if window=7, pg_probackup must keep at least one full backup copy that is older than seven days, with all the corresponding WAL files.

To view the current retention policy settings, run this command:

pg_probackup retention show

To clean up the backup directory in accordance with retention policy, run:

pg_probackup retention purge

pg_probackup deletes all backup copies and the corresponding WAL files that do not conform to the policy defined in the pg_probackup.conf file. If both redundancy and window options are set, pg_probackup keeps backup copies that satisfy both conditions. For example, if you set redundancy=2 and window=7, pg_probackup cleans up the backup directory to keep only two full backup copies if at least one of them is older than 7 days.

Alternatively, you can override the default policy using --redundancy and --window command-line options. For example:

pg_probackup retention purge --redundancy=3 --window=5

As a result, pg_probackup keeps three backup copies, at least one of which is older than five days.

Deleting Backups

Unnecessary backup can be deleted by specifying its identifier in delete command:

pg_probackup delete backup_id

This command will delete the specified backup along with all the following incremental backups, if any.

This way it is possible to delete some recent incremental backups, retaining an underlying full backup and some of incremental backups that follow it. In this case the next backup in PTRACK mode will not be correct as some changes since the last retained backup will be lost. Either full backup or incremental backup in PAGE mode (given that all necessary WAL files are still in the archive) should be taken then.

If --wal option is specified, WAL files not necessary to restore any of remaining backups will be deleted as well. This is safe, because deletion of any backup will keep every possibly necessary WAL files.

To delete unnecessary WAL files without deleting any of backups, execute delwal command:

pg_probackup delwal

This command operates the same way as --wal option of delete command, except that it does not delete any backups.

Backup identifier can be specified in delwal command. In this case all WAL files will be deleted, except for those needed to restore from the specified backup and more recent backups.

pg_probackup delwal backup_id

This mode should be used with caution as it allows to delete WAL files required for some of the existing backups.

Alternatively, you can configure backup retention policy and set up your system to periodically clean up redundant backup copies accordingly. For details, see the section called “Configuring Backup Retention Policy”.

Backup from Standby

If replication is in use, starting with server version 9.6 a backup can be taken not only from primary server, but also from standby. Backup taken from standby is absolutely interchangeable with backup taken from primary (bearing in mind possible replication delay). Page-level incremental backups from standby are not allowed.

Currently it is required for primary database server to have full_page_writes turned on (in future this requirement may be relaxed in the case checksums are enabled on data pages).

The same backup directory can be used for pg_probackup on both servers, primary and standby, as long as it is accessible in both server's file systems. This way all backups, taken from either primary or standby, are shown together and could be managed from one server or from the other.

A backup can be used to restore primary database server as well as standby. It depends on the server on which pg_probackup is executed with restore command. Note that recovered Postgres Pro Enterprise will always run as primary server if started right after the pg_probackup. To run it as standby, edit recovery.conf file created by pg_probackup: at least delete every parameter that specify recovery target (recovery_target, recovery_target_time, and recovery_target_xid), change target timeline to 'latest', and add standby_mode = 'on'. Probably primary_conninfo should be added too for streaming replication, and hot_standby = 'on' in database configuration parameters for hot standby mode.

Additional Features

Parallel Execution

Backup, recovery, and validating process can be executed in several parallel threads. This can significantly speed up the operation given enough resources (CPU cores, disk, and network throughput).

Parallel execution is controlled by -j/--threads command line option, for example:

pg_probackup backup -b full -j 4

pg_probackup restore -j 4

Note that parallel recovery applies only to copying data from backup to cluster's data directory. When Postgres Pro Enterprise server is started, it starts to replay WAL records (either from the archive or from local directory), and this currently cannot be done in parallel.

Checking Cluster and Backup Consistency

When checksums are enabled for the database cluster, pg_probackup uses this information to check correctness of data files. While reading each page, pg_probackup checks whether the calculated checksum coincides with the checksum stored in the page. This guarantees that backup is free of corrupted pages; taking full backup effectively checks correctness of all cluster's data files.

Pages are packed before going to backup, leaving unused parts of pages behind (see Section 63.6). Hence the restored database cluster is not an exact copy of the original, but is binary-compatible with it.

Whether page checksums are enabled or not, pg_probackup calculates checksums for each file in a backup. Checksums are checked immediately after backup is taken and right before restore, to timely detect possible backup corruptions.

Validating Backups

Before restoring the database cluster, you can validate the available backups. To make sure the recovery is possible, run the validate command with the exact options you are going to use for recovery. pg_probackup checks that all the required backup files are present and can be used to restore the database cluster if needed.

For example, to check that you can restore the database cluster from a backup copy with the backup_id up to the specified xid transaction ID, run this command:

pg_probackup validate --xid=xid backup_id

For the list of available options, see the section called “Options”.

If validation completed successfully, pg_probackup displays the corresponding message. If validation fails, you will receive an error message with the exact time and transaction ID up to which the recovery is possible.

Notes

Currently pg_probackup has the following limitations:

The utility can be used only with PostgreSQL-compatible servers with the same major release and the same page size.
Server version 9.5 or higher is supported.
Microsoft Windows operating system is not supported.
Incremental backups in the PTRACK mode cannot be taken with standard PostgreSQL server.
Configuration files outside Postgres Pro Enterprise data directory are not included in backup and should be backed up separately.

Authors

pg_probackup utility is based on pg_arman, that was originally written by NTT and then developed and maintained by Michael Paquier.

Features such as parallel execution, incremental and autonomous backups are developed in Postgres Professional by Yury Zhuravlev (aka stalkerg).

Please report bugs and requests at https://github.com/postgrespro/pg_probackup/issues.