sdmspec.json

sdmspec.json — Shardman initialization file

Synopsis

   sdmspec.json
  

Description

shardmanctl uses the sdmspec.json configuration file during Shardman cluster initialization. A shardman-utils package provides a sample configuration file.

sdmspec.json file consists of these major documents: LadleSpec, ClusterSpec and StolonSpec, embedded in ClusterSpec. LadleSpec determines basic filesystem paths used by Shardman and global settings of the cluster. It is primarily used by shardmanctl when it deploys new nodes and by shardmand when it determines which services should be running on the current node. ClusterSpec contains database-related settings, i. e., administrative and replication user logins and authentication method, FDW parameters and stolon configuration (StolonSpec). ClusterSpec is used by shardmanctl tool, which work with the DBMS and do not care about how underlying nodes are organized.

LadleSpec Parameters

StoreConnInfo

Object that has the following string properties: Endpoints, CAFile, CertFile, and Key. It is used by services to figure out how to connect to etcd. Usually, StoreConnInfo is not specified explicitly, but is filled in automatically based on shardmanctl options related to etcd store that are used during Shardman cluster initialization.

Repfactor

Integer determining how many replicas shardmanctl should configure for each DBMS.

PlacementPolicy

String determining the policy of placing DBMS instances. Currently, cross and manual placement policy is only supported. The former value clover is used as an alias for cross policy.

With cross placement policy, nodes are grouped in clovers, where each node is running the master DBMS server and replicas for all other nodes in the clover. The number of nodes in a clover is determined by Repfactor and equals Repfactor + 1.

manual placement policy allows you to manually add/remove the required number of replicas to the specified replication groups. In this case, Rеpfactor is used for recommendation purposes only and does not impose restrictions.

DataDir

DataDir allows to specify a directory other than the default (/var/lib/pgpro/sdm-14/data) for storing data.

PGsInitialPort

Ports starting with this integer are assigned to PostgeSQL instances.

ClusterSpec Parameters

PgSuAuthMethod

Authentication method used by the administrative user to connect to the DBMS. Can be any authentication method supported by PostgreSQL. scram-sha-256 is currently recommended. md5 is currently allowed but not recommended. The default is trust.

PgSuUsername

Name of the administrative DBMS user. Created on cluster initialization. Defaults to the name of the effective user running shardmanctl init.

PgSuPassword

Password for the administrative DBMS user. Can be changed using shardmanctl update.

PgReplAuthMethod

Authentication method used by the replication user to connect to the DBMS. Can be any authentication method supported by PostgreSQL. scram-sha-256 is currently recommended. md5 is currently allowed but not recommended. The default is trust.

PgReplUsername

Name of the replication DBMS user. Created on cluster initialization. The default is repluser.

PgReplPassword

Password for the replication DBMS user. Can be changed using shardmanctl update.

StolonSpec

stolon cluster specification. For more details, see StolonSpec Parameters. Can be changed using shardmanctl update.

FDWOptions

This object contains FDW settings.

These settings can be changed using shardmanctl update (with the exception of parameters related to authorization, server connection, ssl and kerberos, as well as the service, target_session_attrs options).

Foreign servers corresponding to Shardman replication groups will also get extended_features setting automatically enabled. Never set this parameter for postgres_fdw foreign servers which you define for your own purposes (for example, to load data into Shardman cluster).

StolonSpec Parameters

The StolonSpec specification can include all usual stolon options described in Stolon Cluster Specification. However, the following options should be carefully tuned for a Shardman cluster.

pgHBA

JSON array of pg_hba.conf strings. The default one allows PgSuUsername user's access from anywhere with PgSuAuthMethod authentication method. If the value of defaultSUReplAccessMode is strict, pg_hba.conf strings must explicitly allow PgSuUsername and PgReplUsername users' access from all Shardman cluster nodes.

synchronousReplication

Determines whether replicas should use synchronous replication. Should be true in a Shardman cluster. The default is true.

maxSynchronousStandbys

Maximum number of required synchronous standbys when synchronous replication is enabled. Should be Repfactor in a Shardman cluster. The default is Repfactor.

automaticPgRestart

Determines whether a DBMS instance should be automatically restarted after a change of the pgParameters hash table that requires a restart. Should be enabled in a Shardman cluster. The default is true.

pgParameters

Hash table that determines PostgreSQL settings, including Shardman-specific settings.

Shardman-specific PostgreSQL Settings

The following settings in pgParameters are Shardman-specific:

enable_csn_snapshot (boolean)

Enables or disables CSN-based tracking of the transaction visibility for a snapshot.

PostgreSQL uses the clock timestamp as a CSN, so enabling CSN-based snapshots can be useful for implementing global snapshots and global transaction visibility.

When this parameter is enabled, PostgreSQL creates the pg_csn directory under PGDATA to keep track of CSN and XID mappings.

This parameter has significant performance overhead.

The default is off.

csn_snapshot_defer_time (integer)

Specifies the minimal age of records that are allowed to be vacuumed, in seconds.

All global transactions must start on all participant nodes within csn_snapshot_defer_time seconds after start, otherwise, they are aborted with a csn snapshot too old error.

The default is 5.

foreign_join_fast_path (boolean)

Turns on fast path for foreign join planning. When it is on, foreign join paths for SELECT queries are searched before all other possible paths and search stops for a join as soon as foreign join path is found.

The default is off.

postgres_fdw.enforce_foreign_join (boolean)

Turns on alternative estimations for foreign join costs, which highly increases chances for join of several foreign tables referring to the same server to be pushed down.

The default is off.

postgres_fdw.optimize_cursors (boolean)

Sets postgres_fdw to try fetching the first portion of cursor data immediately after declaration and delay the cursor closing.

This postgres_fdw parameter forces it to avoid closing cursors after the end of scan. Cursors are closed at the end of transaction.

The default is off.

postgres_fdw.use_twophase (enum)

Sets postgres_fdw to use the two-phase commit (2PC) protocol for distributed transactions.

This postgres_fdw parameter forces it to use a two-phase commit if the transaction touches several nodes. When set to auto, two-phase commit is only used in transactions with enable_csn_snapshot=true and isolation level equal to or higher than "repeatable read".

Temporary tables cannot be used in 2PC transactions.

The default is auto.

postgres_fdw.estimate_as_hashjoin (boolean)

When enabled, the planner estimates foreign join costs similar to costs of a hash-join whenever possible. This cost is compared to the default cost (which is similar to nested loop) and the smaller cost is selected for the path. The default is off.

shardman.broadcast_ddl (boolean)

Sets Shardman extension to broadcast DDL statements to all replication groups.

When this parameter is on, Shardman extension broadcasts supported DDL statements to all replication groups if it does make sense for those statements. You can enable/disable this behavior anytime.

The default is off.

shardman.config_uuid (string)

Internal parameter used by Shardman tools to find out config version. Should never be set manually. Ignored if set via shardmanctl update or shardmanctl init.

The default is empty.

shardman.num_parts (integer)

Specifies the default number of sharded table partitions.

A sharded table has this default number of partitions unless num_parts is specified in CREATE TABLE.

To allow scaling, shardman.num_parts should be larger than the expected maximum number of nodes in a Shardman cluster.

The default is 20. Possible values are from 1 to 1000.

shardman.rgid (integer)

Specifies the replication group ID of a Shardman node.

This parameter is set by Shardman utilities when the node is added to the cluster and should never be changed manually.

The default is -1.

shardman.sync_schema (boolean)

Sets Shardman to propagate all DDL statements that touch sharded and global relations to all replication groups.

When this parameter is on, Shardman broadcasts all supported utility statements touching sharded and global relations to all replication groups. It is not recommended to turn this off.

The default is on.

shardman.silk_enabled (boolean)

Enables or disables Silk experimental transport.

When this parameter is on, on server startup Silk initialization is performed, including startup of Silk background workers. Changing this parameter requires a server restart.

The default is off.

shardman.query_engine_mode (enum)

Switches between modes of query planning/execution. Valid values are none, plan and text.

none means that query planning/execution will not use the silk transport.

plan means that necessary query subplan is serialized and transferred via silk transport for remote execution.

text means that text query representation is transferred via silk transport for remote execution.

The default is none. Has no effect if shardman.silk_enabled is off.

shardman.silk_use_ip (string)

Silk transport uses IP address, specified by this parameter, for node identification. If a host name is specified, it is resolved, and the first IP address, corresponding to this name, is used.

The default is node hostname.

shardman.silk_listen_ip (string)

Silk routing daemon listens for incoming connections on this IP address. If a host name is specified, it is resolved, and the first IP address, corresponding to this name, is used.

The default is shardman.silk_use_ip.

shardman.silk_use_port (integer)

Silk routing daemon listens for incoming connections on this port. This setting should be the same for all nodes in the Shardman cluster.

The default is 8888.

shardman.silk_num_workers (integer)

Number of background workers allocated for distributed execution. This setting must be lesser then max_worker_processes (paying attention to auxilary postgres worker processes).

The default is 2.

shardman.database (string)

Name of the database that all silk workers connect to.

The default is postgres.

shardman.monitor_interval (integer)

shardman.monitor_interval is deprecated and acts as noop.

Use shardman.monitor_dxact_interval instead.

shardman.monitor_dxact_interval (integer)

Interval between checking for outdated prepared transactions.

The shardman monitor background process wakes up every shardman.monitor_dxact_interval seconds and attempts to check and resolve any prepared transactions that did not complete and became outdated for some reason. To resolve these transactions, the shardman monitor process determines the coordinator of transaction and requests a transaction status from the coordinator. Based on the status of the transaction, the shardman monitor will either roll back or commit the transaction.

To disable prepared transactions resolution logic set shardman.monitor_dxact_interval to 0.

The default is 5 seconds.

shardman.monitor_trim_csnxid_map_interval (integer)

Each cluster node freezes the self xmin value for csn_snapshot_defer_time seconds to support global transactions. Large csn_snapshot_defer_time values can negatively impact performance. The shardman monitor has routine that every shardman.monitor_trim_csnxid_map_interval seconds updates xmin on all nodes to the minimum possible (taking into account active transactions).

Background routine will run on only one node in the Shardman cluster. Note that this will give an additional load on this node.

To disable such updates set shardman.monitor_trim_csnxid_map_interval to 0.

The default is 5 seconds.

shardman.monitor_dxact_timeout (integer)

Maximum allowed age of prepared transactions before resolution try.

During the resolution of a prepared transaction, the shardman monitor determines whether the transaction is outdated or not. A transaction becomes outdated if it was prepared more than shardman.monitor_dxact_timeout seconds ago.

The default is 5 seconds.

shardman.trim_csnxid_map_naptime (integer)

Specifies the minimum delay between xmin updates on all nodes. See shardman.monitor_trim_csnxid_map_interval for more information.

The default is 5. Possible values are from 1 to 600.

shardman.monitor_deadlock_interval (integer)

Interval between checking for distributed deadlock conditions.

The shardman monitor background process wakes up every shardman.monitor_deadlock_interval seconds and searches for distributed deadlocks in the cluster. It gathers information about mutual locks from all nodes and looks for circular dependencies between transactions. If it detects a deadlock, it resolves it by canceling one of the backend processes involved in the lock.

To disable distributed deadlock resolution logic set shardman.monitor_deadlock_interval to 0.

The default is 2 seconds.

Examples

Spec File for a Cluster with Enabled scram-sha-256 Authentication

This is the contents of an example sdmspec.json configuration file:

{
   "LadleSpec":{
      "PGsInitialPort": 5432,
      "Repfactor": 1,
      "PlacementPolicy": "cross"
   },
   "ClusterSpec":{
      "PgSuAuthMethod": "scram-sha-256",
      "PgSuUsername":"postgres",
      "PgSuPassword": "ChangeMe",
      "PgReplUsername":"repluser",
      "PgReplAuthMethod":"scram-sha-256",
      "PgReplPassword":"ChangeMe",
      "FDWOptions" :{
          "async_capable": "true",
          "batch_size": "100",
          "fetch_size": "50000",
          "fdw_tuple_cost": "0.2"
      },
      "StolonSpec":{
         "pgHBA":[
            "host replication postgres 0.0.0.0/0 scram-sha-256",
            "host replication postgres ::0/0 scram-sha-256",
            "host all all 0.0.0.0/0 scram-sha-256"
         ],
         "pgParameters":{
            "log_statement":"none",
            "log_line_prefix":"%m [%r][%p]",
            "log_min_messages":"INFO",
            "wal_level":"logical",
            "max_worker_processes":"16",
            "max_logical_replication_workers": "9",
            "shared_preload_libraries":"postgres_fdw, shardman",
            "default_transaction_isolation":"repeatable read",
            "max_prepared_transactions":"200",
            "csn_snapshot_defer_time":"15",
            "enable_csn_snapshot":"true",
            "postgres_fdw.estimate_as_hashjoin":"on",
            "enable_partitionwise_aggregate":"on",
            "enable_partitionwise_join":"on",
            "postgres_fdw.enforce_foreign_join":"on",
            "postgres_fdw.optimize_cursors":"on",
            "postgres_fdw.estimate_indexscan":"on",
            "postgres_fdw.use_twophase":"on",
         }
      }
   }
}

From LadleSpec, you can see that a Shardman cluster initialized with this spec file has Repfactor equal to 1 (one replica for each master). ClusterSpec shows that two special users are created in this cluster — superuser postgres and replication user repluser with ChangeMe passwords. They can be authenticated using the md5 or scram-sha-256 authorization method. One postgres_fdw fetch operation will get up to 50000 rows from the remote server. The cost of fetching one row is set to a reasonably high value to make PostgreSQL planner consider conditions pushdown-attractive. pg_hba.conf settings allow postgres user access from anywhere using a replication protocol; all other users can access any database from anywhere. Since defaultSUReplAccessMode is not set to strict, stolon will automatically add entries that allow PgSuUsername user's (postgres) access to any database from anywhere and PgReplUsername user's (repluser) replication access from anywhere.

Several important Shardman-specific parameters are set in the pgParameters hash table. These are:

wal_level

Should be set to logical for Shardman to work correctly.

shared_preload_libraries

Should include postgres_fdw and shardman extensions in the specified order.

max_logical_replication_workers

Should be rather high since the rebalance process uses up to max(max_replication_slots, max_logical_replication_workers, max_worker_processes, max_wal_senders)/3 concurrent threads.

max_prepared_transactions

Should be rather high since Shardman utilities use the 2PC protocol. If postgres_fdw.use_twophase is true, postgres_fdw also uses 2PC.

enable_csn_snapshot

Should be enabled to achieve a true repeatable read isolation level in a distributed system.

csn_snapshot_defer_time

All global transactions must start on all participant nodes within csn_snapshot_defer_time seconds after start, otherwise they will be aborted.

enable_partitionwise_aggregate
enable_partitionwise_join

Set to on to enable optimizations for partitioned tables.

See Also

shardmanctl