sdmspec.json — Shardman initialization file
sdmspec.json
shardmanctl uses the
sdmspec.json configuration file during
Shardman cluster initialization.
A shardman-utils package provides a sample
configuration file.
sdmspec.json file consists of these major documents:
LadleSpec, ClusterSpec and
StolonSpec, embedded in ClusterSpec.
LadleSpec determines basic filesystem paths used by
Shardman and global settings of the cluster.
It is primarily used by shardmanctl when
it deploys new nodes and by shardmand
when it determines which services should be running on the current node.
ClusterSpec contains database-related settings, i. e.,
administrative and replication user logins and authentication method,
FDW parameters and stolon configuration
(StolonSpec).
ClusterSpec is used by
shardmanctl tool, which work with the DBMS and do not
care about how underlying nodes are organized.
LadleSpec ParametersStoreConnInfo
Object that has the following string properties:
Endpoints, CAFile,
CertFile, and Key.
It is used by services to figure out how to connect to etcd.
Usually, StoreConnInfo is not specified explicitly,
but is filled in automatically based on shardmanctl
options related to etcd store that are used during
Shardman cluster initialization.
RepfactorInteger determining how many replicas shardmanctl should configure for each DBMS.
PlacementPolicy
String determining the policy of placing DBMS instances. Currently,
cross and manual placement policy is only supported.
The former value clover is used as an alias for cross policy.
With cross placement policy, nodes are grouped in
clovers, where each node is running the master DBMS server
and replicas for all other nodes in the clover. The number of nodes in
a clover is determined by Repfactor and
equals Repfactor + 1.
manual placement policy allows you to manually add/remove the required number of
replicas to the specified replication groups. In this case, Rеpfactor
is used for recommendation purposes only and does not impose restrictions.
DataDir
DataDir allows to specify a directory other than the
default (/var/lib/pgpro/sdm-14/data) for storing data.
PGsInitialPortPorts starting with this integer are assigned to PostgeSQL instances.
ClusterSpec ParametersPgSuAuthMethod
Authentication method used by the administrative user to connect to
the DBMS. Can be any authentication method supported by PostgreSQL.
scram-sha-256 is currently recommended. md5 is currently allowed but not recommended. The default is
trust.
PgSuUsername
Name of the administrative DBMS user. Created on cluster
initialization. Defaults to the name of the effective user running
shardmanctl init.
PgSuPasswordPassword for the administrative DBMS user. Can be changed using shardmanctl update.
PgReplAuthMethod
Authentication method used by the replication user to connect to
the DBMS. Can be any authentication method supported by PostgreSQL.
scram-sha-256 is currently recommended. md5 is currently allowed but not recommended. The default is
trust.
PgReplUsername
Name of the replication DBMS user. Created on cluster
initialization. The default is repluser.
PgReplPasswordPassword for the replication DBMS user. Can be changed using shardmanctl update.
StolonSpec
stolon cluster specification.
For more details, see StolonSpec Parameters.
Can be changed using shardmanctl update.
FDWOptionsThis object contains FDW settings.
These settings can be changed using shardmanctl update
(with the exception of parameters related to authorization, server connection,
ssl and kerberos, as well as the service, target_session_attrs options).
Foreign servers corresponding to Shardman
replication groups will also get extended_features
setting automatically enabled. Never set this parameter for
postgres_fdw foreign servers which you define for your own purposes
(for example, to load data into Shardman cluster).
StolonSpec Parameters
The StolonSpec specification can include all usual
stolon options described in
Stolon Cluster Specification.
However, the following options should be carefully tuned for
a Shardman cluster.
pgHBA
JSON array of pg_hba.conf strings. The default one allows
PgSuUsername user's access from anywhere with
PgSuAuthMethod authentication method. If the value of
defaultSUReplAccessMode is strict,
pg_hba.conf strings must explicitly allow PgSuUsername and
PgReplUsername users' access from all
Shardman cluster nodes.
synchronousReplication
Determines whether replicas should use synchronous replication.
Should be true in
a Shardman cluster.
The default is true.
maxSynchronousStandbys
Maximum number of required synchronous standbys when synchronous
replication is enabled.
Should be ≥ Repfactor in
a Shardman cluster.
The default is Repfactor.
automaticPgRestart
Determines whether a DBMS instance should be automatically restarted after
a change of the pgParameters hash table that requires a restart.
Should be enabled in a Shardman cluster.
The default is true.
pgParametersHash table that determines PostgreSQL settings, including Shardman-specific settings.
The following settings in pgParameters are
Shardman-specific:
enable_csn_snapshot (boolean)
Enables or disables CSN-based tracking of the transaction visibility for a snapshot.
PostgreSQL uses the clock timestamp as a CSN, so enabling CSN-based snapshots can be useful for implementing global snapshots and global transaction visibility.
When this parameter is enabled, PostgreSQL creates the
pg_csn directory under PGDATA to keep
track of CSN and XID mappings.
This parameter has significant performance overhead.
The default is off.
csn_snapshot_defer_time
(integer)
Specifies the minimal age of records that are allowed to be vacuumed, in seconds.
All global transactions must start on all participant nodes within
csn_snapshot_defer_time seconds after start,
otherwise, they are aborted with a
“csn snapshot too old” error.
The default is 5.
foreign_join_fast_path
(boolean)
Turns on fast path for foreign join planning. When it is on, foreign join paths for SELECT queries are searched before all other possible paths and search stops for a join as soon as foreign join path is found.
The default is off.
postgres_fdw.enforce_foreign_join
(boolean)
Turns on alternative estimations for foreign join costs, which highly increases chances for join of several foreign tables referring to the same server to be pushed down.
The default is off.
postgres_fdw.optimize_cursors
(boolean)
Sets postgres_fdw to try fetching the first portion of cursor data immediately after declaration and delay the cursor closing.
This postgres_fdw parameter forces it to avoid closing cursors after the end of scan. Cursors are closed at the end of transaction.
The default is off.
postgres_fdw.use_twophase
(enum)
Sets postgres_fdw to use the two-phase commit (2PC) protocol for distributed transactions.
This postgres_fdw parameter forces it to use a two-phase commit if the transaction touches several nodes. When set to auto, two-phase commit is only used in transactions with enable_csn_snapshot=true and isolation level equal to or higher than "repeatable read".
Temporary tables cannot be used in 2PC transactions.
The default is auto.
postgres_fdw.estimate_as_hashjoin (boolean)
When enabled, the planner estimates foreign join costs similar
to costs of a hash-join whenever possible. This cost is compared to the
default cost (which is similar to nested loop) and the smaller cost is selected
for the path. The default is off.
shardman.broadcast_ddl
(boolean)
Sets Shardman extension to broadcast DDL statements to all replication groups.
When this parameter is on, Shardman extension broadcasts supported DDL statements to all replication groups if it does make sense for those statements. You can enable/disable this behavior anytime.
The default is off.
shardman.config_uuid
(string)
Internal parameter used by Shardman tools to find out config version. Should never be set manually. Ignored if set via shardmanctl update or shardmanctl init.
The default is empty.
shardman.num_parts
(integer)
Specifies the default number of sharded table partitions.
A sharded table has this default
number of partitions
unless num_parts
is specified in CREATE TABLE.
To allow scaling, shardman.num_parts should be larger
than the expected maximum
number of nodes in a Shardman cluster.
The default is 20.
Possible values are from 1 to 1000.
shardman.rgid
(integer)
Specifies the replication group ID of a Shardman node.
This parameter is set by Shardman utilities when the node is added to the cluster and should never be changed manually.
The default is -1.
shardman.sync_schema
(boolean)
Sets Shardman to propagate all DDL statements that touch sharded and global relations to all replication groups.
When this parameter is on, Shardman broadcasts all supported utility statements touching sharded and global relations to all replication groups. It is not recommended to turn this off.
The default is on.
shardman.silk_enabled
(boolean)
Enables or disables Silk experimental transport.
When this parameter is on, on server startup Silk initialization is performed,
including startup of Silk background workers. Changing this parameter requires a server restart.
The default is off.
shardman.query_engine_mode
(enum)
Switches between modes of query planning/execution. Valid values are none, plan and text.
none means that query planning/execution will not use the silk transport.
plan means that necessary query subplan is serialized and transferred via silk transport for remote execution.
text means that text query representation is transferred via silk transport for remote execution.
The default is none. Has no effect if shardman.silk_enabled is off.
shardman.silk_use_ip
(string)
Silk transport uses IP address, specified by this parameter, for node identification. If a host name is specified, it is resolved, and the first IP address, corresponding to this name, is used.
The default is node hostname.
shardman.silk_listen_ip
(string)
Silk routing daemon listens for incoming connections on this IP address. If a host name is specified, it is resolved, and the first IP address, corresponding to this name, is used.
The default is shardman.silk_use_ip.
shardman.silk_use_port
(integer)
Silk routing daemon listens for incoming connections on this port. This setting should be the same for all nodes in the Shardman cluster.
The default is 8888.
shardman.silk_num_workers
(integer)
Number of background workers allocated for distributed execution.
This setting must be lesser then max_worker_processes (paying attention to auxilary postgres worker processes).
The default is 2.
shardman.database
(string)
Name of the database that all silk workers connect to.
The default is postgres.
shardman.monitor_interval
(integer)
shardman.monitor_interval is deprecated and acts as noop.
Use shardman.monitor_dxact_interval instead.
shardman.monitor_dxact_interval
(integer)
Interval between checking for outdated prepared transactions.
The shardman monitor background process wakes up every shardman.monitor_dxact_interval seconds and attempts to check and resolve any prepared transactions that did not complete and became outdated for some reason. To resolve these transactions, the shardman monitor process determines the coordinator of transaction and requests a transaction status from the coordinator. Based on the status of the transaction, the shardman monitor will either roll back or commit the transaction.
To disable prepared transactions resolution logic set shardman.monitor_dxact_interval to 0.
The default is 5 seconds.
shardman.monitor_trim_csnxid_map_interval
(integer)
Each cluster node freezes the self xmin value for csn_snapshot_defer_time seconds to support global transactions. Large csn_snapshot_defer_time values can negatively impact performance. The shardman monitor has routine that every shardman.monitor_trim_csnxid_map_interval seconds updates xmin on all nodes to the minimum possible (taking into account active transactions).
Background routine will run on only one node in the Shardman cluster. Note that this will give an additional load on this node.
To disable such updates set shardman.monitor_trim_csnxid_map_interval to 0.
The default is 5 seconds.
shardman.monitor_dxact_timeout
(integer)
Maximum allowed age of prepared transactions before resolution try.
During the resolution of a prepared transaction, the shardman monitor determines whether the transaction is outdated or not. A transaction becomes outdated if it was prepared more than shardman.monitor_dxact_timeout seconds ago.
The default is 5 seconds.
shardman.trim_csnxid_map_naptime
(integer)
Specifies the minimum delay between xmin updates on all nodes. See shardman.monitor_trim_csnxid_map_interval for more information.
The default is 5.
Possible values are from 1 to 600.
shardman.monitor_deadlock_interval
(integer)
Interval between checking for distributed deadlock conditions.
The shardman monitor background process wakes up every shardman.monitor_deadlock_interval seconds and searches for distributed deadlocks in the cluster. It gathers information about mutual locks from all nodes and looks for circular dependencies between transactions. If it detects a deadlock, it resolves it by canceling one of the backend processes involved in the lock.
To disable distributed deadlock resolution logic set shardman.monitor_deadlock_interval to 0.
The default is 2 seconds.
This is the contents of an example sdmspec.json configuration file:
{
"LadleSpec":{
"PGsInitialPort": 5432,
"Repfactor": 1,
"PlacementPolicy": "cross"
},
"ClusterSpec":{
"PgSuAuthMethod": "scram-sha-256",
"PgSuUsername":"postgres",
"PgSuPassword": "ChangeMe",
"PgReplUsername":"repluser",
"PgReplAuthMethod":"scram-sha-256",
"PgReplPassword":"ChangeMe",
"FDWOptions" :{
"async_capable": "true",
"batch_size": "100",
"fetch_size": "50000",
"fdw_tuple_cost": "0.2"
},
"StolonSpec":{
"pgHBA":[
"host replication postgres 0.0.0.0/0 scram-sha-256",
"host replication postgres ::0/0 scram-sha-256",
"host all all 0.0.0.0/0 scram-sha-256"
],
"pgParameters":{
"log_statement":"none",
"log_line_prefix":"%m [%r][%p]",
"log_min_messages":"INFO",
"wal_level":"logical",
"max_worker_processes":"16",
"max_logical_replication_workers": "9",
"shared_preload_libraries":"postgres_fdw, shardman",
"default_transaction_isolation":"repeatable read",
"max_prepared_transactions":"200",
"csn_snapshot_defer_time":"15",
"enable_csn_snapshot":"true",
"postgres_fdw.estimate_as_hashjoin":"on",
"enable_partitionwise_aggregate":"on",
"enable_partitionwise_join":"on",
"postgres_fdw.enforce_foreign_join":"on",
"postgres_fdw.optimize_cursors":"on",
"postgres_fdw.estimate_indexscan":"on",
"postgres_fdw.use_twophase":"on",
}
}
}
}
From LadleSpec, you can see that
a Shardman cluster initialized with this spec file
has Repfactor equal to 1 (one replica for each master).
ClusterSpec shows that two special
users are created in this cluster — superuser postgres
and replication user repluser with
ChangeMe passwords. They can be authenticated using the
md5 or scram-sha-256 authorization method. One postgres_fdw
fetch operation will get up to 50000 rows from the remote server. The cost of fetching
one row is set to a reasonably high value to make PostgreSQL
planner consider conditions pushdown-attractive. pg_hba.conf settings
allow postgres user access from anywhere using a replication
protocol; all other users can access any database from anywhere. Since
defaultSUReplAccessMode is not set to
strict, stolon will automatically add
entries that allow PgSuUsername
user's (postgres) access to any database from anywhere and
PgReplUsername user's (repluser)
replication access from anywhere.
Several important Shardman-specific parameters
are set in the pgParameters hash table. These are:
wal_level
Should be set to logical for
Shardman to work correctly.
shared_preload_libraries
Should include postgres_fdw and shardman extensions in the specified order.
max_logical_replication_workers
Should be rather high since the rebalance process uses up to
max(
concurrent threads.
max_replication_slots,
max_logical_replication_workers, max_worker_processes,
max_wal_senders)/3
max_prepared_transactions
Should be rather high since Shardman utilities
use the 2PC protocol. If postgres_fdw.use_twophase
is true, postgres_fdw also uses 2PC.
enable_csn_snapshotShould be enabled to achieve a true repeatable read isolation level in a distributed system.
csn_snapshot_defer_time
All global transactions must start on all participant nodes within
csn_snapshot_defer_time seconds after start,
otherwise they will be aborted.
enable_partitionwise_aggregateenable_partitionwise_join
Set to on to enable optimizations for partitioned tables.