shardman-ladle

shardman-ladle — deployment tool for Shardman

Synopsis

shardman-ladle [common_options] init [ -f | --spec-file spec_file_name] | spec_text

shardman-ladle [common_options] addnodes -n | --nodes node_names [--no-rebalance]

shardman-ladle [common_options] cleanup [ -p | --processrepgroups ]

shardman-ladle [common_options] rmnodes -n | --nodes node_names

shardman-ladle [common_options] status [ -f | --format text | json ]

shardman-ladle [common_options] backup --datadir directory [ --maxtasks number_of_tasks ]

shardman-ladle [common_options] recover [--info file] [--dumpfile file] [--metadata-only] [--timeout seconds]

shardman-ladle [common_options] probackup [ init | archive-command | backup | restore | show | validate ] [subcommand options]

Here common_options are:

[--cluster-name cluster_name] [--log-level error | warn | info | debug ] [--retries retries_number] [--session-timeout seconds] [--store-endpoints store_endpoints] [--store-ca-file store_ca_file] [--store-cert-file store_cert_file] [--store-key client_private_key] [--store-timeout duration] [--version] [ -h | --help ]

Description

shardman-ladle is a utility to initialize a Shardman cluster, add or remove nodes from the cluster, perform cleanup after unsuccessful operations or display the status of a cluster.

The init command is used to register a new Shardman cluster in the etcd store. In the init mode, shardman-ladle reads the cluster specification, processes it and saves to the etcd store as parts of two JSON documents: ClusterSpec — as part of shardman/cluster0/clusterdata and LadleSpec — as part of shardman/cluster0/ladledata (cluster0 is the default cluster name used by Shardman utilities). Common options related to the etcd store, such as --store-endpoints, are also saved to the etcd store and pushed down to all Shardman services started by shardman-bowl. For the description of the Shardman initialization file format, see sdmspec.json. For usage details of the command, see the section called “Registering a Shardman Cluster”.

The addnodes command is used to add new nodes to a Shardman cluster. With the default clover placement policy, nodes are added to a cluster by clovers. Each node in a clover runs the primary DBMS instance and perhaps several replicas of other nodes in the clover. The number of replicas is determined by the Repfactor configuration parameter. So, each clover consists of Repfactor + 1 nodes and can stand loss of Repfactor nodes.

shardman-ladle performs the addnodes operation in several steps. The command:

  1. Takes a global metadata lock.

  2. For each specified node, checks that shardman-bowl is running on it and that it sees the current cluster configuration.

  3. Calculates the services to be present on each node and saves this information in etcd as part of the shardman/cluster0/ladledata Layout object.

  4. Generates the configuration for new stolon clusters (also called replication groups) and initializes them.

  5. Waits for shardman-bowl to start all the necessary services, checks that new replication groups are accessible and have correct configuration.

  6. For each new replication group in the cluster, but the first one, copies the schema from a random existing replication group to the new one; ensures that the Shardman extension is installed on the new replication group and recalculates OIDs used in the extension configuration tables.

  7. On each existing replication group, defines foreign servers referencing the new replication group and recreates definitions of foreign servers on the new replication group.

  8. Recreates all partitions of sharded tables and all global tables as foreign tables referencing data from old replication groups and has the changes registered in the etcd storage.

  9. Rebalances partitions of sharded tables. The data for these partitions is transferred from existing nodes using logical replication. When the data is in place, the foreign table corresponding to the partition is replaced with a regular table and all foreign tables referencing the data in the original replication group are modified to reference the new one, the old partition being also replaced by the foreign table. You can skip this step using the --no-rebalance.

  10. Registers the added replication groups in shardman/cluster0/ladledata.

For usage details of the command, see the section called “Adding Nodes to a Shardman Cluster”.

The cleanup command is used for cleanup after failure of the addnodes command or of the shardmanctl rebalance command. Final changes to the etcd store are done at the end of the command execution. This simplifies the cleanup process. During cleanup, incomplete clover definitions and definitions of the corresponding replication groups are removed from the etcd metadata. Definitions of the corresponding foreign servers are removed from the DBMS metadata of the remaining replication groups. Since the cleanup process can be destructive, by default, the tool operates in the report-only mode: it only shows actions to be done during the actual cleanup. For usage details of the command, see the section called “Performing Cleanup”.

The rmnodes command is used to remove nodes from a Shardman cluster. This command removes clovers containing the specified nodes from the cluster. The last clover in the cluster cannot be removed. Any data (such as partitions of sharded relations) on removed replication groups is migrated to the remaining replication groups using logical replication, and all references to the removed replication groups (including definitions of foreign servers) are removed from the metadata of the remaining replication groups. Finally, the metadata in etcd is updated. For usage details of the command, see the section called “Removing Nodes from a Shardman cluster”.

The status command is used to display health status of Shardman cluster subsystems. The command checks the availability of all etcd cluster nodes, consistency of metadata stored in etcd, correctness of replication group definitions, availability of shardman-bowl daemons and of all DBMS instances in the cluster. The issues are reported in plain-text or JSON format. For usage details of the command, see the section called “Getting the Health Status of Cluster Subsystems”.

The backup command is used to backup a Shardman cluster. A backup consists of a directory with base backups of all replication groups and WAL files needed for recovery. etcd metadata is saved to the etcd_dump file. The backup_info file is created during a backup and contains the backup description. For details of the backup command logic, see Cluster backup with pg_basebackup. For usage details of the command, see the section called “Backing up a Shardman Cluster”.

The recover command is used to restore a Shardman cluster from a backup created by the backup command. For details of the recover command logic, see Cluster recovery from a backup using pg_basebackup. For usage details of the command, see the section called “Restoring a Shardman Cluster”.

The probackup command is used to backup and restore the Shardman cluster using pg_probackup backup utility. For details of the probackup command logic, see Backup anf Recovery Shardman Backups using pg_probackup. For usage details of the command, see the section called “probackup.

Command-line Reference

This section describes shardman-ladle commands. For usage details, see the section called “Usage”. For Shardman common options used by the commands, see the section called “Common Options”.

init

Syntax:

shardman-ladle [common_options] init [-f|--spec-file spec_file_name]|spec_text

Registers a Shardman cluster in the etcd store.

-f spec_file_name
--specfile=spec_file_name

Specifies the file with the cluster specification string. The value of - means the standard input. By default, the string is passed in spec_text. For usage details, see the section called “Registering a Shardman Cluster”.

addnodes

Syntax:

shardman-ladle [common_options] addnodes -n|--nodes node_names [--no-rebalance]

Adds nodes to a Shardman cluster.

-n node_names
--nodes=node_names

Required.

Specifies the comma-separated list of nodes to be added.

--no-rebalance

Skip the step of rebalancing partitions of sharded tables. For more details, see the section called “Adding Nodes to a Shardman Cluster”.

cleanup

Syntax:

shardman-ladle [common_options] cleanup [-p|--processrepgroups]

Performs cleanup after of the addnodes command or of the shardmanctl rebalance command.

-p node_names
--processrepgroups=node_names

Execute actual cleanup. By default, the tool only shows actions to be done during the actual cleanup. For more details, see the section called “Performing Cleanup”.

rmnodes

Syntax:

shardman-ladle [common_options] rmnodes -n|--nodes node_names

Removes nodes from a Shardman cluster.

-n node_names
--nodes=node_names

Specifies the comma-separated list of nodes to be removed. For usage details, see the section called “Removing Nodes from a Shardman cluster”.

status

Syntax:

shardman-ladle [common_options] status [-f|--format text|json]

Reports on the health status of Shardman cluster subsystems.

-f text|json
--format=text|json

Specifies the report format: plain-text or JSON.

Default: text.

For more details, see the section called “Getting the Health Status of Cluster Subsystems”.

backup

Syntax:

shardman-ladle [common_options] backup --datadir directory [--maxtasks number_of_tasks]

Backs up a Shardman cluster.

--datadir directory

Required.

Specifies the directory to write the output to. If the directory exists, it must be empty. If it does not exist, shardman-ladle creates it (but not parent directories).

--maxtasks number_of_tasks

Specifies the maximum number of concurrent tasks (pg_receivewal or pg_basebackup commands) to run.

Default: 0 - no restriction.

For more details, see the section called “Backing up a Shardman Cluster”

recover

Syntax:

shardman-ladle [common_options] recover [--info file] [--dumpfile file] [--metadata-only] [--timeout seconds]

Restores a Shardman cluster from a backup created by the backup command.

--dumpfile file

Required for metadata-only restore.

Specifies the file to load the etcd metadata dump from.

--info file

Required for full restore.

Specifies the file to load information about the backup from.

--metadata-only

Perform metadata-only restore. By default, full restore is performed.

--timeout seconds

Exit with error after waiting until the cluster is ready or the recovery is complete for the specified number of seconds

For more details, see the section called “Restoring a Shardman Cluster”

probackup

Syntax:

shardman-ladle [common_options] probackup 
       [init|archive-command|backup|restore|show|validate]
       [--help]
       [subcommand options]
  

Creates a backup of the Shardman cluster and restore Shardman cluster form a backup using pg_probackup.

Subcommands list:

init

Initializes a new repository folder for the Shardman cluster backup.

archive-command

Adds and enables/disables archive_command to each replication group in the Shardman cluster.

backup

Creates a backup of the Shardman cluster.

restore

Restores Shardman cluster from the selected backup.

show

Show backups list of the Shardman cluster.

validate

Checks the selected Shardman cluster backup for integrity.

--help

Shows subcommand help.

init

Syntax:

shardman-ladle probackup init 
        -B|--backup-path path 
        -E|--etcd-path path 
        [--remote-port port] 
        [--remote-user username] 
        [--ssh-key path] 
        [-t|--timeout seconds] 
        [-m|--maxtasks number_of_tasks]
    

Initializes a new repository folder for the Shardman cluster backup.

-B path
--backup-path path

Required. Specifies the path to the backup catalog where the Shardman cluster backups should be stored.

-E path
--etcd-path path

Required. Specifies the path to the catalog where the etcd dumps should be stored.

--remote-port port

Specifies a remote ssh port for the replication groups instances. If not specified, default value - 22.

--remote-user username

Specifies remote ssh user for the replication groups instances. If not specified, default value - postgres.

archive-command

Syntax:

shardman-ladle probackup archive-command [add|rm]
        -B|--backup-path path 
        [--remote-port port] 
        [--remote-user username]
    

Adds/Remove and enables/disables archive command to every replication group in the Shardman cluster to put WAL logs into the initialized backup repository.

add

Adds and enables archive command to every replication group in the Shardman cluster.

rm

Disables archive command from every replication group in the Shardman cluster. No additional parameters required.

-B path
--backup-path path

Required when adding archive_command. Specifies the path to the backup catalog where the Shardman cluster backups should be stored.

--remote-port port

Specifies a remote ssh port for the replication groups instances. If not specified, default value - 22.

--remote-user username

Specifies remote ssh user for the replication groups instances. If not specified, default value - postgres.

backup

Syntax:

shardman-ladle probackup backup -B|--backup-path path 
        -E|--etcd-path path 
        -b|--backup-mode MODE
        --compress
        --compress-algorithm algorithm
        --compress-level level
        [--remote-port port] 
        [--remote-user username]
        [--ssh-key path]
        [-t|--timeout seconds] 
        [-m|--maxtasks number_of_tasks]
    

Creates a backup of the Shardman cluster.

-B path
--backup-path path

Required. Specifies the path to the backup catalog where the Shardman cluster backups should be stored.

-E path
--etcd-path path

Required. Specifies the path to the catalog where the etcd dumps should be stored.

-b MODE
--backup-mode MODE

Required. Defines backup mode: FULL, PAGE, DELTA.

--compress

Enables backup compression. If this flag is not specified, compression will be disabled If the flag is specified, the compression parameters below should be specified.

--compress-algorithm algorithm

Defines compression algorithm: zlib, pglz, default value - none.

--compress-level level

Defines compression level - 0-9, default value - 0.

--remote-port port

Specifies a remote ssh port for the replication groups instances. If not specified, default value - 22.

--remote-user username

Specifies remote ssh user for the replication groups instances. If not specified, default value - postgres.

--ssh-key path

Specifies the ssh private key for the remote ssh commands execution. If not specified, default value - $HOME/.ssh/id_rsa.

restore

Syntax:

shardman-ladle probackup restore 
        -B|--backup-path path 
        -i|--backup-id id
        [-t|--timeout seconds] 
        [-m|--maxtasks number_of_tasks] 
    

Restores Shardman cluster from the selected backup.

-B path
--backup-path path

Required. Specifies the path to the backup catalog where the Shardman cluster backups should be stored.

-i id
--backup-id id

Required. Specifies backup id for restore.

--metadata-only

Perform metadata-only restore. By default, full restore is performed.

show

Syntax:

shardman-ladle probackup show 
        -B|--backup-path path 
        [--format format] 
    

Show backups list of the Shardman cluster.

-B path
--backup-path path

Required. Specifies the path to the backup catalog where the Shardman cluster backups should be stored.

-f format
--format format

Specifies output format: table or json. If not specified, default value - table.

validate

Syntax:

shardman-ladle probackup validate 
        -B|--backup-path path 
        -i|--backup-id id
        [-t|--timeout seconds] 
        [-m|--maxtasks number_of_tasks] 
    

Checks the selected Shardman cluster backup for integrity.

-B path
--backup-path path

Required. Specifies the path to the backup catalog where the Shardman cluster backups should be stored.

-i id
--backup-id id

Required. Specifies backup id for validation.

Common Options

shardman-ladle common options are optional parameters that are not specific to the utility. They specify etcd connection settings, cluster name and a few more settings. By default shardman-ladle tries to connect to the etcd store 127.0.0.1:2379 and use the cluster0 cluster name. The default log level is info.

-h, --help

Show brief usage information

--cluster-name cluster_name

Specifies the name for a cluster to operate on. The default is cluster0.

--log-level level

Specifies the log verbosity. Possible values of level are (from minimum to maximum): error, warn, info and debug. The default is info.

--retries number

Specifies how many times shardman-ladle retries a failing etcd request. If an etcd request fails, most likely, due to a connectivity issue, shardman-ladle retries it the specified number of times before reporting an error. The default is 5.

--session-timeout seconds

Specifies the session timeout for shardman-ladle locks. If there is no connectivity between shardman-ladle and the etcd store for the specified number of seconds, the lock is released. The default is 30.

--store-endpoints string

Specifies the etcd address in the format: http[s]://address[:port](,http[s]://address[:port])*. The default is http://127.0.0.1:2379.

--store-ca-file string

Verify the certificate of the HTTPS-enabled etcd store server using this CA bundle

--store-cert-file string

Specifies the certificate file for client identification by the etcd store

--store-key string

Specifies the private key file for client identification by the etcd store

--store-timeout duration

Specifies the timeout for a etcd request. The default is 5 seconds.

--version

Show shardman-utils version information

Environment

SDM_CLUSTER_NAME

An alternative to setting the --cluster-name option

SDM_LOG_LEVEL

An alternative to setting the --log-level option

SDM_NODES

An alternative to setting the --nodes option for addnodes and rmnodes

SDM_RETRIES

An alternative to setting the --retries option

SDM_SPEC_FILE

An alternative to setting the --spec-file option for init

SDM_STORE_ENDPOINTS

An alternative to setting the --store-endpoints option

SDM_STORE_CA_FILE

An alternative to setting the --store-ca-file option

SDM_STORE_CERT_FILE

An alternative to setting the --store-cert-file option

SDM_STORE_KEY

An alternative to setting the --store-key option

SDM_STORE_TIMEOUT

An alternative to setting the --store-timeout option

SDM_SESSION_TIMEOUT

An alternative to setting the --session-timeout option

Usage

Registering a Shardman Cluster

To register a Shardman cluster in the etcd store, run the following command:

shardman-ladle [common_options] init [-f|--spec-file spec_file_name]|spec_text

You must provide the string with the cluster specification. You can do it as follows:

  • On the command line — do not specify the -f option and pass the string in spec_text.

  • On the standard input — specify the -f option and pass - in spec_file_name.

  • In a file — specify the -f option and pass the filename in spec_file_name.

Adding Nodes to a Shardman Cluster

To add nodes to a Shardman cluster, run the following command:

shardman-ladle [common_options] addnodes -n|--nodes node_names

You must specify the -n (--nodes) option to pass the comma-separated list of nodes to be added. Since all nodes are referred by their hostnames, these hostnames must be correctly resolved on all nodes.

If addnodes command fails during execution, use the cleanup command to fix possible cluster configuration issues.

Performing Cleanup

By default, cleanup operates in the report-only mode, that is, the following command will only show actions to be done during actual cleanup:

shardman-ladle [common_options] cleanup

To perform the actual cleanup, run the following command:

shardman-ladle [common_options] cleanup -p|--processrepgroups

Removing Nodes from a Shardman cluster

To remove nodes from a Shardman cluster, run the following command:

shardman-ladle [common_options] rmnodes -n|--nodes node_names

Specify the -n (--nodes) option to pass the comma-separated list of nodes to be removed.Recreates all partitions of sharded tables

Note

Do not use the cleanup command to fix possible cluster configuration issues after a failure of rmnodes. Redo the rmnodes command instead.

To remove all nodes in a cluster and not care about the data, just reinitialize the cluster. If a removed replication group contains local (non-sharded and non-global) tables, the data is silently lost after the replication group removal.

Getting the Health Status of Cluster Subsystems

To get a report on the health status of Shardman cluster subsystems in plain-text format, run the following command:

shardman-ladle [common_options] status

To get the report in JSON format, pass the value of json through the -f (--format) option. Each detected issue is reported as an unknown status, warning, error or fatal error. The tool can also report an operational error, which means that there was an issue during the cluster health check. When the command encounters a fatal or operational error, it stops further diagnostics. An error is considered fatal if it impacts higher-level subsystems. For example, an inconsistency in etcd metadata does not allow correct cluster operations and must be handled first, so there is no point in further diagnostics.

Backing up a Shardman Cluster

To backup a Shardman cluster, you can run the following command:

shardman-ladle [common_options] backup --datadir directory

You must pass the directory to write the output to through the --datadir option. You can limit the number of running concurrent tasks (pg_receivewal or pg_basebackup commands) by passing the limit through the --maxtasks option.

Restoring a Shardman Cluster

shardman-ladle can perform either full restore or metadata-only restore of a Shardman cluster from a backup created by the backup command.

To perform full restore, you can run the following command:

shardman-ladle [common_options] recover --info file

Pass the file to load information about the backup from through the --info option. In most cases, set this option to point to the backup_info file in the backup directory or to its modified copy.

If you encounter issues with an etcd instance, it makes sense to perform metadata-only restore. To do this, you can run the following command:

shardman-ladle [common_options] recover --dumpfile file --metadata-only

You must pass the file to load the etcd metadata dump from through the --dumpfile option.

For both kinds of restore, you can specify --timeout for the tool to exit with error after waiting until the cluster is ready or the recovery is complete for the specified number of seconds.

Before running the recover command, specify DataRestoreCommand and RestoreCommand in the backup_info file. DataRestoreCommand fetches the base backup and restores it to the stolon data directory. RestoreCommand fetches the WAL file and saves it to stolon pg_wal directory. These commands can use the following substitutions:

%p

Destination path on the server.

%s

SystemId of the restored database (the same in the backup and in restored cluster).

%f

Name of the WAL file to restore.

stolon-keeper runs both commands on each node in the cluster. Therefore:

  • Make the backup accessible to these nodes (for example, by storing it in a shared filesystem or by using a remote copy protocol, such as SFTP).

  • Commands to fetch the backup are executed as the operating system user under which stolon daemons work (usually postgres), so set the permissions for the backup files appropriately.

These examples show how to specify RestoreCommand and DataRestoreCommand:

  • If a backup is available through a passwordless SCP, you can use:

     "DataRestoreCommand": "scp -r user@host:/var/backup/shardman/%s/backup/* %p",
     "RestoreCommand": "scp user@host:/var/backup/shardman/%s/wal/%f %p"
      

  • If a backup is stored on NFS and available through /var/backup/shardman path, you can use:

     "DataRestoreCommand": "cp -r /var/backup/shardman/%s/backup/* %p",
     "RestoreCommand": "cp /var/backup/shardman/%s/wal/%f %p"
      

Backing up a Shardman Cluster using probackup command

To backup a Shardman cluster, the following requirements must be met:

  • Shardman cluster configuration parameter enable_csn_snapshot must be on, this parameter is necessary for the cluster backup to be consistent. If this parameter is disabled, consistent backup is not possible;

  • On the backup host Shardman utilities must be installed into /opt/pgpro/sdm-14/bin;

  • On the backup host and on each cluster node pg_probackup must be installed into /opt/pgpro/sdm-14/bin;

  • On the backup host postgres Linux user and group must be created;

  • Passwordless ssh between backup host and each Shardman cluster node for the postgres Linux user must be configured;

  • Backup folder must be created;

  • Access for the postgres Linux user to the backup folder must be granted;

  • shardman-ladle utility must be run under postgres Linux user;

  • Init subcommand for the backup repository initialization must be successfuly executed on the backup host;

  • Archive-command subcommand for the enabling archive_command for each replication group to stream WALs into inited repository must be successfuly executed on the backup host;

For example, on backup host:

  groupadd postgres
  useradd -m -N -g postgres -r -d /var/lib/postgresql -s /bin/bash
 

Then add ssh keys to provide passwordless ssh access between backup host and Shardman cluster hosts. Then on backup host:

  apt-get install pg-probackup shardman-utils
  mkdir -p directory
  chown postgres:postgres directory -R
  shardman-ladle [common_options] probackup init --backup-path=directory --etcd-path=directory/etcd --remote-user=postgres --remote-port=22
  shardman-ladle [common_options] probackup archive-command --backup-path=directory --remote-user=postgres --remote-port=22
  

If the above requirements are met, then run backup subcommand for the cluster backup;

shardman-ladle [common_options] probackup init --backup-path=directory --etcd-path=directory --backup-mode=MODE
  

You must pass the directories through the --backup-path and --etcd-path options and backup mode through the --backup-mode. Full and delta backups are available with FULL, DELTA, PAGE values. Also it's possible to specify backup compression options throught --compress, --compress-algorithm,--compress-level flags, and --remote-port, --remote-user flags. You can limit the number of running concurrent tasks when doing backup by passing the limit through the --maxtasks flag.

Restoring a Shardman Cluster using probackup command

shardman-ladle in probackup mode can perform either full restore or metadata-only restore of a Shardman cluster from a backup created by the probackup backup command.

To perform full or partial restore, firstly you must select needed backup to restore from. To show list of available backups run the following command:

shardman-ladle [common_options] probackup show --backup-dir=path --format=format

The output should be a list of backups with it's ids in table or json format. Then pick the needed backup id and run the probackup restore command.

shardman-ladle [common_options] probackup restore --backup-dir=path --backup-id=id

Pass the path to the repo through the --backup-dir option and backup id througt --backup-id flag.

If you encounter issues with an etcd instance, it makes sense to perform metadata-only restore. To do this, you can run the following command:

shardman-ladle [common_options] probackup restore --backup-dir=path --backup-id=id --metadata-only

For both kinds of restore, you can specify --timeout for the tool to exit with error after waiting until the cluster is ready or the recovery is complete for the specified number of seconds.

Examples

Initializing the Cluster

To initialize a Shardman cluster that has the cluster0 name, uses an etcd cluster consisting of n1,n2 and n3 nodes listening on port 2379, ensure proper settings in the spec file sdmspec.json and run:

$ shardman-ladle --store-endpoints http://n1:2379,http://n2:2379,http://n3:2379 init -f sdmspec.json

Adding Nodes to the Cluster

To add n1,n2, n3 and n4 nodes to the cluster, run:

$ shardman-ladle --store-endpoints http://n1:2379,http://n2:2379,http://n3:2379 addnodes -n n1,n2,n3,n4

Important

The number of nodes being added must be a multiple of Repfactor + 1.

Removing Nodes from the Cluster

To remove n1 and n2 nodes, along with clovers that contain them, from the cluster0 cluster, run:

$ shardman-ladle --store-endpoints http://n1:2379,http://n2:2379,http://n3:2379 rmnodes -n n1,n2

Getting the Cluster Status

Here is a sample status output from shardman-ladle:

$ shardman-ladle --store-endpoints http://n1:2379,http://n2:2379,http://n3:2379 status

=== Store status ===
  STATUS    MESSAGE    REPLICATION GROUP  NODE
  OK      Store is OK
=== Metadata status ===
  STATUS     MESSAGE      REPLICATION GROUP  NODE
  OK      Metadata is OK
=== Bowls status ===
  STATUS         MESSAGE        REPLICATION GROUP  NODE
  OK      Bowl on node n1 is OK                    n1
  OK      Bowl on node n2 is OK                    n2
  OK      Bowl on node n3 is OK                    n3
  OK      Bowl on node n4 is OK                    n4
=== Replication Groups status ===
  STATUS             MESSAGE             REPLICATION GROUP  NODE
  OK      Replication group clover-1-n1  clover-1-n1
          is OK
  OK      Replication group clover-1-n2  clover-1-n2
          is OK
  OK      Replication group clover-2-n3  clover-2-n3
          is OK
  OK      Replication group clover-2-n4  clover-2-n4
          is OK
=== Dictionary status ===
  STATUS             MESSAGE             REPLICATION GROUP  NODE
  OK      Replication group clover-1-n1  clover-1-n1
          dictionary is OK
  OK      Replication group clover-1-n2  clover-1-n2
          dictionary is OK
  OK      Replication group clover-2-n3  clover-2-n3
          dictionary is OK
  OK      Replication group clover-2-n4  clover-2-n4
          dictionary is OK

Performing Backup and Recovery

To create a backup of the cluster0 cluster using etcd at etcdserver listening on port 2379 and store it in the local directory /var/backup/shardman, run:

$ shardman-ladle --store-endpoints http://etcdserver:2379 backup --datadir=/var/backup/shardman

Assume that you are performing a recovery from a backup to the cluster0 cluster using etcd at etcdserver listening on port 2379 and you take the backup description from the /var/backup/shardman/backup_info file. Edit the /var/backup/shardman/backup_info file, set DataRestoreCommand, RestoreCommand as necessary and run:

$ shardman-ladle --store-endpoints http://etcdserver:2379 recover --info /var/backup/shardman/backup_info

For metadata-only restore, run:

$ shardman-ladle --store-endpoints http://etcdserver:2379 recover --metadata-only --dumpfile /var/backup/shardman/etcd_dump

Performing Backup and Recovery with probackup command

To create a backup of the cluster0 cluster using etcd at etcdserver listening on port 2379 and store it in the local directory /var/backup/shardman, first init the backups repository with the init subcommand:

$ shardman-ladle --store-endpoints http://etcdserver:2379 probackup init --backup-path=/var/backup/shardman --etcd-path=/var/backup/etcd_dump

Then add and enable archive_command with the archive-command subcommand:

$ shardman-ladle --store-endpoints http://etcdserver:2379 probackup archive-command add --backup-path=/var/backup/shardman

If the repository successfuly inited and archive_command successfuly added, then run the FULL backup creation with the backup subcommand:

$ shardman-ladle --store-endpoints http://etcdserver:2379 probackup backup --backup-path=/var/backup/shardman --etcd-path=/var/backup/etcd_dump --backup-mode=FULL --compress --compress-algorithm=zlib --compress-level=5

For the DELTA or PAGE backup creation, run backup subcommand with DELTA or PAGE --backup-mode parameter:

$ shardman-ladle --store-endpoints http://etcdserver:2379 probackup backup --backup-path=/var/backup/shardman --etcd-path=/var/backup/etcd_dump --backup-mode=DELTA --compress --compress-algorithm=zlib --compress-level=5

To show the created backup id, run show subcommand:

$ shardman-ladle --store-endpoints http://etcdserver:2379 probackup show --backup-path=/var/backup/shardman --format=table

 REPLICATION GROUP    HOSTNAME  BACKUPIDS  BACKUP MODE  LSNS          BACKUP TIMESTAMP
 7125062069167771757  n1        RFP1YS     DELTA        0/250001B8    2022-08-27 19:14:06.259797832 +0000 UTC
 7125062069167752779  n2                   DELTA        0/250001A8
 7125062069166812789  n3                   DELTA        0/25000108
 7125062069167512479  n4                   DELTA        0/250001E8
 7125062069167771757  n1        RFP1FI     FULL         0/250000B8    2022-07-27 19:14:06.259797832 +0000 UTC
 7125062069167752779  n2                   FULL         0/250000A8
 7125062069166812789  n3                   FULL         0/25000008
 7125062069167512479  n4                   FULL         0/250000E8

To validate created backup, run validate subcommand:

$ shardman-ladle --store-endpoints http://etcdserver:2379 probackup validate --backup-path=/var/backup/shardman --backup-id=RFP1FI

Assume that you are performing a recovery from a backup to the cluster0 cluster using etcd at etcdserver listening on port 2379 and you take the backup id from the show command:

$ shardman-ladle --store-endpoints http://etcdserver:2379 probackup restore --backup-path=/var/backup/shardman --backup-id=RFP1FI
  

Finally we need to enable back archive_command.

$ shardman-ladle --store-endpoints http://etcdserver:2379 probackup archive-command add --backup-path=/var/backup/shardman

For metadata-only restore, run:

$ shardman-ladle --store-endpoints http://etcdserver:2379 probackup restore --metadata-only --backup-path=/var/backup/shardman --backup-id=RFP1FI

See Also

shardmanctl, sdmspec.json, shardman-bowl