sdmspec.json

sdmspec.json — Shardman initialization file

Synopsis

   sdmspec.json
  

Description

shardmanctl uses the sdmspec.json configuration file during Shardman cluster initialization. A shardman-utils package provides a sample configuration file.

sdmspec.json file consists of these major documents: LadleSpec, ClusterSpec and StolonSpec, embedded in ClusterSpec. LadleSpec determines basic filesystem paths used by Shardman and global settings of the cluster. It is primarily used by shardmanctl when it deploys new nodes and by shardmand when it determines which services should be running on the current node. ClusterSpec contains database-related settings, i. e., administrative and replication user logins and authentication method, FDW parameters and stolon configuration (StolonSpec). ClusterSpec is used by shardmanctl tool, which work with the DBMS and do not care about how underlying nodes are organized.

LadleSpec Parameters

Repfactor

Integer determining how many replicas shardmanctl should configure for each DBMS.

PlacementPolicy

String determining the policy of placing DBMS instances. Currently, cross and manual placement policy is only supported. The former value clover is used as an alias for cross policy.

With cross placement policy, nodes are grouped in clovers, where each node is running the master DBMS server and replicas for all other nodes in the clover. The number of nodes in a clover is determined by Repfactor and equals Repfactor + 1.

manual placement policy allows you to manually add/remove the required number of replicas to/from the specified replication groups. In this case, Rеpfactor is only used for recommendation purposes and does not impose restrictions.

DataDir

Allows you to specify a directory other than the default one (/var/lib/pgpro/sdm-14/data) for storing data.

PGsInitialPort

Ports starting with this integer are assigned to PostgeSQL instances.

ClusterSpec Parameters

PgSuAuthMethod

Authentication method used by the administrative user to connect to the DBMS. Can be any authentication method supported by PostgreSQL. scram-sha-256 is currently recommended. md5 is currently allowed but not recommended.

Default: trust.

PgSuUsername

Name of the administrative DBMS user. Created on cluster initialization. Defaults to the name of the effective user running shardmanctl init.

PgSuPassword

Password for the administrative DBMS user. Can be changed using shardmanctl update.

PgReplAuthMethod

Authentication method used by the replication user to connect to the DBMS. Can be any authentication method supported by PostgreSQL. scram-sha-256 is currently recommended. md5 is currently allowed but not recommended.

Default: trust.

PgReplUsername

Name of the replication DBMS user. Created on cluster initialization.

Default: repluser.

PgReplPassword

Password for the replication DBMS user. Can be changed using shardmanctl update.

StolonSpec

stolon cluster specification. For more details, see StolonSpec Parameters. Can be changed using shardmanctl update.

FDWOptions

This object contains FDW settings.

These settings can be changed using shardmanctl update (with the exception of settings related to authorization, server connection, ssl and kerberos, as well as the service, target_session_attrs options).

Foreign servers corresponding to Shardman replication groups will also get extended_features setting automatically enabled. Never set this parameter for postgres_fdw foreign servers which you define for your own purposes (for example, to load data into Shardman cluster).

StolonSpec Parameters

The StolonSpec specification can include all usual stolon options described in Stolon Cluster Specification. However, the following options should be carefully tuned for a Shardman cluster.

pgHBA

JSON array of pg_hba.conf strings. The default one allows PgSuUsername user's access from anywhere with PgSuAuthMethod authentication method. If the value of defaultSUReplAccessMode is strict, pg_hba.conf strings must explicitly allow PgSuUsername and PgReplUsername users' access from all Shardman cluster nodes.

synchronousReplication

Determines whether replicas should use synchronous replication. Should be true in a Shardman cluster.

Default: true.

maxSynchronousStandbys

Maximum number of required synchronous standbys when synchronous replication is enabled. Should be Repfactor in a Shardman cluster. The default is Repfactor.

automaticPgRestart

Determines whether a DBMS instance should be automatically restarted after a change of the pgParameters hash table that requires a restart. Should be enabled in a Shardman cluster.

Default: true.

pgParameters

Hash table that determines PostgreSQL settings, including Shardman-specific settings.

Shardman-specific PostgreSQL Settings

The following settings in pgParameters are Shardman-specific:

enable_csn_snapshot (boolean)

Enables or disables Commit Sequence Number (CSN) based tracking of the transaction visibility for a snapshot.

PostgreSQL uses the clock timestamp as a CSN, so enabling CSN-based snapshots can be useful for implementing global snapshots and global transaction visibility.

When this parameter is enabled, PostgreSQL creates the pg_csn directory under PGDATA to keep track of CSN and XID mappings.

Default: off.

csn_snapshot_defer_time (integer)

Specifies the minimal age of records that are allowed to be vacuumed, in seconds.

All global transactions must start on all participant nodes within csn_snapshot_defer_time seconds after start, otherwise, they are aborted with a csn snapshot too old error.

Default: 5.

csn_lsn_map_size (integer)

Size of CSNLSNMap.

The commit record of each completed transaction in Shardman contains the assigned CSN for this transaction. This value, together with the LSNof this record, forms a pair of values (CSN, LSN). Each of the cluster nodes stores a certain number of such pairs in RAM in a special structure - the CSNLSNMap. This map is used to get the syncpoint. See the "Architecture" chapter "Syncpoints and consistent backup" section for more information.

The default is 1024.

foreign_join_fast_path (boolean)

Turns on a fast path for foreign join planning. When it is on, foreign join paths for SELECT queries are searched before all other possible paths and the search stops for a join as soon as a foreign join path is found.

Default: off.

optimize_correlated_subqueries (boolean)

Enables or disables the query planner's logic to transform correlated subqueries into semi-joins.

The default is on.

postgres_fdw.enforce_foreign_join (boolean)

Turns on alternative estimations for foreign join costs, which highly increases chances for join of several foreign tables referring to the same server to be pushed down. The cost of original join is estimated to be calculated as (1 - 1/(cost + 1)), where cost is an originally estimated cost for this remote join.

Default: off.

postgres_fdw.optimize_cursors (boolean)

Sets postgres_fdw to try fetching the first portion of cursor data immediately after declaration and delay the cursor closing.

This postgres_fdw parameter forces it to avoid closing cursors after the end of scan. Cursors are closed at the end of transaction.

Default: off.

postgres_fdw.subplan_pushdown (boolean)

Enables or disables postgres_fdw logic to push down subqueries, referencing only to foreign server tables, to this foreign server.

The default is off.

postgres_fdw.use_twophase (enum)

Sets postgres_fdw to use the two-phase commit (2PC) protocol for distributed transactions.

This postgres_fdw parameter forces it to use a two-phase commit if the transaction touches several nodes. When set to auto, a two-phase commit is only used in transactions with enable_csn_snapshot=true and isolation level equal to or higher than REPEATABLE READ.

Temporary tables cannot be used in 2PC transactions.

Default: auto.

postgres_fdw.estimate_as_hashjoin (boolean)

When enabled, the planner estimates a foreign join cost in a way similar to a cost of a hash-join whenever possible. This cost is compared to the default cost (which is similar to nested loops) and the smaller cost is selected for the path.

Default: off.

shardman.broadcast_ddl (boolean)

Sets Shardman extension to broadcast DDL statements to all replication groups.

When this parameter is on, Shardman extension broadcasts supported DDL statements to all replication groups if it does make sense for those statements. You can enable/disable this behavior anytime.

Default: off.

shardman.config_uuid (string)

Internal parameter used by Shardman tools to find out the config version. Should never be set manually. Ignored if set via shardmanctl update or shardmanctl init.

Default: empty.

shardman.num_parts (integer)

Specifies the default number of sharded table partitions.

A sharded table has this default number of partitions unless num_parts is specified in CREATE TABLE.

To allow scaling, shardman.num_parts should be larger than the expected maximum number of nodes in a Shardman cluster.

Possible values are from 1 to 1000.

Default: 20.

shardman.rgid (integer)

Specifies the replication group ID of a Shardman node.

This parameter is set by Shardman utilities when the node is added to the cluster and should never be changed manually.

Default: -1.

shardman.sync_schema (boolean)

Sets Shardman to propagate all DDL statements that touch sharded and global relations to all replication groups.

When this parameter is on, Shardman broadcasts all supported utility statements touching sharded and global relations to all replication groups. It is not recommended to turn this off.

Default: on.

shardman.query_engine_mode (enum)

Switches between modes of query planning/execution. Possible values are none and text.

none means that query planning/execution will not use the Silk transport.

text means that text query representation is transferred via Silk transport for remote execution.

Default: none.

shardman.silk_use_ip (string)

Silk transport uses IP address, specified by this parameter, for node identification. If the host name is specified, it is resolved and the first IP address corresponding to this name, is used.

Default: hostname node.

shardman.silk_listen_ip (string)

The Silk routing daemon listens for incoming connections on this IP address. If the host name is specified, it is resolved and the first IP address corresponding to this name, is used.

Default: shardman.silk_use_ip.

shardman.silk_use_port (integer)

The Silk routing daemon listens for incoming connections on this port. This setting should be the same for all nodes in the Shardman cluster.

Default: 8888.

shardman.silk_num_workers (integer)

Number of background workers allocated for distributed execution. This setting must be less than max_worker_processes (including auxilary postgres worker processes).

Default: 2.

shardman.database (string)

Name of the database that all Silk workers connect to.

Default: postgres.

shardman.monitor_interval (integer)

shardman.monitor_interval is deprecated and acts as noop.

Use shardman.monitor_dxact_interval instead.

shardman.monitor_dxact_interval (integer)

Interval between checks for outdated prepared transactions.

The shardman monitor background process wakes up every shardman.monitor_dxact_interval seconds and attempts to check and resolve any prepared transactions that did not complete and became outdated for some reason. To resolve these transactions, the shardman monitor process determines the coordinator of the transaction and requests the transaction status from the coordinator. Based on the status of the transaction, shardman monitor will either roll back or commit the transaction.

To disable the prepared transaction resolution logic, set shardman.monitor_dxact_interval to 0.

Default: 5 (seconds).

shardman.monitor_trim_csnxid_map_interval (integer)

Each cluster node freezes its own xmin value for csn_snapshot_defer_time seconds to support global transactions. Large csn_snapshot_defer_time values can negatively impact the performance. shardman monitor has a routine that every shardman.monitor_trim_csnxid_map_interval seconds updates xmin on all nodes to the minimum possible value (taking into account active transactions).

The background routine will run on only one node in the Shardman cluster. Note that this will give an additional load on this node.

To disable such updates, set shardman.monitor_trim_csnxid_map_interval to 0.

Default: 5 (seconds).

shardman.monitor_dxact_timeout (integer)

Maximum allowed age of prepared transactions before a resolution attempt.

During the resolution of a prepared transaction, shardman monitor determines whether the transaction is outdated or not. A transaction becomes outdated if it was prepared more than shardman.monitor_dxact_timeout seconds ago.

Default: 5 (seconds).

shardman.trim_csnxid_map_naptime (integer)

Specifies the minimum delay between xmin updates on all nodes. See shardman.monitor_trim_csnxid_map_interval for more information.

Possible values are from 1 to 600.

Default: 5.

shardman.monitor_deadlock_interval (integer)

Interval between checks for distributed deadlock conditions.

The shardman monitor background process wakes up every shardman.monitor_deadlock_interval seconds and searches for distributed deadlocks in the cluster. It gathers information about mutual locks from all nodes and looks for circular dependencies between transactions. If it detects a deadlock, it resolves it by canceling one of the backend processes involved in the lock.

To disable the distributed deadlock resolution logic, set shardman.monitor_deadlock_interval to 0.

Default: 2 (seconds).

Examples

Spec File for a Cluster with Enabled scram-sha-256 Authentication

This is the contents of an example sdmspec.json configuration file:

{
   "LadleSpec":{
      "PGsInitialPort": 5432,
      "Repfactor": 1,
      "PlacementPolicy": "cross"
   },
   "ClusterSpec":{
      "PgSuAuthMethod": "scram-sha-256",
      "PgSuUsername":"postgres",
      "PgSuPassword": "ChangeMe",
      "PgReplUsername":"repluser",
      "PgReplAuthMethod":"scram-sha-256",
      "PgReplPassword":"ChangeMe",
      "FDWOptions" :{
          "async_capable": "true",
          "batch_size": "100",
          "fetch_size": "50000",
          "fdw_tuple_cost": "0.2"
      },
      "StolonSpec":{
         "pgHBA":[
            "host replication postgres 0.0.0.0/0 scram-sha-256",
            "host replication postgres ::0/0 scram-sha-256",
            "host all all 0.0.0.0/0 scram-sha-256"
         ],
         "pgParameters":{
            "log_statement":"none",
            "log_line_prefix":"%m [%r][%p]",
            "log_min_messages":"INFO",
            "wal_level":"logical",
            "max_worker_processes":"16",
            "max_logical_replication_workers": "9",
            "shared_preload_libraries":"postgres_fdw, shardman",
            "default_transaction_isolation":"repeatable read",
            "max_prepared_transactions":"200",
            "csn_snapshot_defer_time":"15",
            "enable_csn_snapshot":"true",
            "postgres_fdw.estimate_as_hashjoin":"on",
            "enable_partitionwise_aggregate":"on",
            "enable_partitionwise_join":"on",
            "postgres_fdw.enforce_foreign_join":"on",
            "postgres_fdw.optimize_cursors":"on",
            "postgres_fdw.estimate_indexscan":"on",
            "postgres_fdw.use_twophase":"on",
         }
      }
   }
}

From LadleSpec, you can see that a Shardman cluster initialized with this spec file has Repfactor equal to 1 (one replica for each master). ClusterSpec shows that two special users are created in this cluster — superuser postgres and replication user repluser with ChangeMe passwords. They can be authenticated using the md5 or scram-sha-256 authorization method. One postgres_fdw fetch operation will get up to 50000 rows from the remote server. The cost of fetching one row is set to a reasonably high value to make PostgreSQL planner consider conditions pushdown-attractive. pg_hba.conf settings allow postgres user access from anywhere using a replication protocol; all other users can access any database from anywhere. Since defaultSUReplAccessMode is not set to strict, stolon will automatically add entries that allow PgSuUsername user's (postgres) access to any database from anywhere and PgReplUsername user's (repluser) replication access from anywhere.

Several important Shardman-specific parameters are set in the pgParameters hash table. These are:

wal_level

Should be set to logical for Shardman to work correctly.

shared_preload_libraries

Should include postgres_fdw and shardman extensions in the specified order.

max_logical_replication_workers

Should be rather high since the rebalance process uses up to max(max_replication_slots, max_logical_replication_workers, max_worker_processes, max_wal_senders)/3 concurrent threads.

max_prepared_transactions

Should be rather high since Shardman utilities use the 2PC protocol. If postgres_fdw.use_twophase is true, postgres_fdw also uses 2PC.

enable_csn_snapshot

Should be enabled to achieve a true repeatable read isolation level in a distributed system.

csn_snapshot_defer_time

All global transactions must start on all participant nodes within csn_snapshot_defer_time seconds after start, otherwise they will be aborted.

enable_partitionwise_aggregate
enable_partitionwise_join

Set to on to enable optimizations for partitioned tables.

pgpro_stats parameters

pgpro_stats.pgss_max_nodes_tracked (integer)

Sets the maximum number of nodes that are tracked by pgpro_stats for query fragments.

Actually sets maximum amount of status entries that pgpro_stats is possible to store for pgpro_stats_lup function. Doesn't affects statistics tracking itself.

The default is 2048.

pgpro_stats.transport_compression (string)

Sets algorithm for transport compression during statistics transferring between nodes.

Transport compression uses (if specified) to compress statistic entries passed from shard nodes to the coordinator. Can be pglz (always available), zlib, lz4, zstd (only if compiled with Shardman) or off.

The default is pglz.

See Also

shardmanctl