sdmspec.json — Shardman initialization file
sdmspec.json
shardmanctl
uses the
sdmspec.json configuration file during
Shardman cluster initialization.
A shardman-utils package provides a sample
configuration file.
sdmspec.json file consists of these major documents:
LadleSpec, ClusterSpec and
StolonSpec, embedded in ClusterSpec.
LadleSpec determines basic filesystem paths used by
Shardman and global settings of the cluster.
It is primarily used by shardmanctl when
it deploys new nodes and by shardmand
when it determines which services should be running on the current node.
ClusterSpec contains database-related settings, i. e.,
administrative and replication user logins and authentication method,
FDW parameters and stolon configuration
(StolonSpec).
ClusterSpec is used by
shardmanctl
tool, which work with the DBMS and do not
care about how underlying nodes are organized.
LadleSpec ParametersRepfactorInteger determining how many replicas shardmanctl should configure for each DBMS.
PlacementPolicy
String determining the policy of placing DBMS instances. Currently,
cross and manual placement policy is only supported.
The former value clover is used as an alias for cross policy.
With cross placement policy, nodes are grouped in
clovers, where each node is running the master DBMS server
and replicas for all other nodes in the clover. The number of nodes in
a clover is determined by Repfactor and
equals Repfactor + 1.
manual placement policy allows you to manually add/remove the required number of
replicas to/from the specified replication groups. In this case, Rеpfactor
is only used for recommendation purposes and does not impose restrictions.
DataDir
Allows you to specify a directory other than the
default one (/var/lib/pgpro/sdm-14/data) for storing data.
PGsInitialPortPorts starting with this integer are assigned to PostgeSQL instances.
ClusterSpec ParametersPgSuAuthMethod
Authentication method used by the administrative user to connect to
the DBMS. Can be any authentication method supported by PostgreSQL.
scram-sha-256 is currently recommended. md5 is currently allowed but not recommended.
Default: trust.
PgSuUsername
Name of the administrative DBMS user. Created on cluster
initialization. Defaults to the name of the effective user running
shardmanctl init.
PgSuPassword
Password for the administrative DBMS user. Can be changed using
shardmanctl
update.
PgReplAuthMethod
Authentication method used by the replication user to connect to
the DBMS. Can be any authentication method supported by PostgreSQL.
scram-sha-256 is currently recommended. md5 is currently allowed but not recommended.
Default: trust.
PgReplUsernameName of the replication DBMS user. Created on cluster initialization.
Default: repluser.
PgReplPassword
Password for the replication DBMS user. Can be changed using
shardmanctl
update.
StolonSpec
stolon cluster specification.
For more details, see StolonSpec Parameters.
Can be changed using
shardmanctl
update.
FDWOptionsThis object contains FDW settings.
These settings can be changed using
shardmanctl
update
(with the exception of settings related to authorization, server connection,
ssl and kerberos, as well as the service, target_session_attrs options).
Foreign servers corresponding to Shardman
replication groups will also get extended_features
setting automatically enabled. Never set this parameter for
postgres_fdw foreign servers which you define for your own purposes
(for example, to load data into Shardman cluster).
StolonSpec Parameters
The StolonSpec specification can include all usual
stolon options described in
Stolon Cluster Specification.
However, the following options should be carefully tuned for
a Shardman cluster.
pgHBA
JSON array of pg_hba.conf strings. The default one allows
PgSuUsername user's access from anywhere with
PgSuAuthMethod authentication method. If the value of
defaultSUReplAccessMode is strict,
pg_hba.conf strings must explicitly allow PgSuUsername and
PgReplUsername users' access from all
Shardman cluster nodes.
synchronousReplication
Determines whether replicas should use synchronous replication.
Should be true in
a Shardman cluster.
Default: true.
maxSynchronousStandbys
Maximum number of required synchronous standbys when synchronous
replication is enabled.
Should be ≥ Repfactor in
a Shardman cluster.
The default is Repfactor.
automaticPgRestartDetermines whether a DBMS instance should be automatically restarted after a change of the pgParameters hash table that requires a restart. Should be enabled in a Shardman cluster.
Default: true.
pgParametersHash table that determines PostgreSQL settings, including Shardman-specific settings.
The following settings in pgParameters are
Shardman-specific:
enable_csn_snapshot (boolean)
Enables or disables Commit Sequence Number (CSN) based tracking of the transaction visibility for a snapshot.
PostgreSQL uses the clock timestamp as a CSN, so enabling CSN-based snapshots can be useful for implementing global snapshots and global transaction visibility.
When this parameter is enabled, PostgreSQL creates the
pg_csn directory under PGDATA to keep
track of CSN and XID mappings.
Default: off.
csn_snapshot_defer_time
(integer)
Specifies the minimal age of records that are allowed to be vacuumed, in seconds.
All global transactions must start on all participant nodes within
csn_snapshot_defer_time seconds after start,
otherwise, they are aborted with a
“csn snapshot too old” error.
Default: 5.
csn_lsn_map_size
(integer)
Size of CSNLSNMap.
The commit record of each completed transaction in Shardman contains the assigned
CSN for this transaction. This value, together with the LSNof this record,
forms a pair of values (CSN, LSN). Each of the cluster nodes stores a certain number of such
pairs in RAM in a special structure - the CSNLSNMap. This map is used to get the syncpoint.
See the "Architecture" chapter "Syncpoints and consistent backup" section for more information.
The default is 1024.
foreign_join_fast_path
(boolean)
Turns on a fast path for foreign join planning. When it is on, foreign join paths for SELECT queries are searched before all other possible paths and the search stops for a join as soon as a foreign join path is found.
Default: off.
optimize_correlated_subqueries
(boolean)
Enables or disables the query planner's logic to transform correlated subqueries into semi-joins.
The default is on.
postgres_fdw.enforce_foreign_join
(boolean)
Turns on alternative estimations for foreign join costs, which highly increases chances for join of several foreign
tables referring to the same server to be pushed down. The cost of original join is estimated
to be calculated as (1 - 1/(cost + 1)), where cost
is an originally estimated cost for this remote join.
Default: off.
postgres_fdw.optimize_cursors
(boolean)
Sets postgres_fdw to try fetching the first portion of cursor data immediately after declaration and delay the cursor closing.
This postgres_fdw parameter forces it to avoid closing cursors after the end of scan. Cursors are closed at the end of transaction.
Default: off.
postgres_fdw.subplan_pushdown
(boolean)
Enables or disables postgres_fdw logic to push down subqueries, referencing only to foreign server tables, to this foreign server.
The default is off.
postgres_fdw.use_twophase
(enum)
Sets postgres_fdw to use the two-phase commit (2PC) protocol for distributed transactions.
This postgres_fdw parameter forces it to use
a two-phase commit if the transaction touches several nodes. When set to auto,
a two-phase commit is only used in transactions with enable_csn_snapshot=true
and isolation level equal to or higher than REPEATABLE READ.
Temporary tables cannot be used in 2PC transactions.
Default: auto.
postgres_fdw.estimate_as_hashjoin (boolean)
When enabled, the planner estimates a foreign join cost in a way similar to a cost of a hash-join whenever possible. This cost is compared to the default cost (which is similar to nested loops) and the smaller cost is selected for the path.
Default: off.
shardman.broadcast_ddl
(boolean)
Sets Shardman extension to broadcast DDL statements to all replication groups.
When this parameter is on, Shardman extension broadcasts supported DDL statements to all replication groups if it does make sense for those statements. You can enable/disable this behavior anytime.
Default: off.
shardman.config_uuid
(string)
Internal parameter used by Shardman tools to find out the config version.
Should never be set manually. Ignored if set via
shardmanctl
update or
shardmanctl
init.
Default: empty.
shardman.num_parts
(integer)
Specifies the default number of sharded table partitions.
A sharded table has this default
number of partitions
unless num_parts
is specified in CREATE TABLE.
To allow scaling, shardman.num_parts should be larger
than the expected maximum
number of nodes in a Shardman cluster.
Possible values are from 1 to 1000.
Default: 20.
shardman.rgid
(integer)
Specifies the replication group ID of a Shardman node.
This parameter is set by Shardman utilities when the node is added to the cluster and should never be changed manually.
Default: -1.
shardman.sync_schema
(boolean)
Sets Shardman to propagate all DDL statements that touch sharded and global relations to all replication groups.
When this parameter is on, Shardman broadcasts all supported utility statements touching sharded and global relations to all replication groups. It is not recommended to turn this off.
Default: on.
shardman.query_engine_mode
(enum)
Switches between modes of query planning/execution. Possible values are none and text.
none means that query planning/execution will not use the Silk transport.
text means that text query representation is transferred via Silk transport for remote execution.
Default: none.
shardman.silk_use_ip
(string)
Silk transport uses IP address, specified by this parameter, for node identification. If the host name is specified, it is resolved and the first IP address corresponding to this name, is used.
Default: hostname node.
shardman.silk_listen_ip
(string)
The Silk routing daemon listens for incoming connections on this IP address. If the host name is specified, it is resolved and the first IP address corresponding to this name, is used.
Default: shardman.silk_use_ip.
shardman.silk_use_port
(integer)
The Silk routing daemon listens for incoming connections on this port. This setting should be the same for all nodes in the Shardman cluster.
Default: 8888.
shardman.silk_num_workers
(integer)
Number of background workers allocated for distributed execution.
This setting must be less than max_worker_processes (including auxilary postgres worker processes).
Default: 2.
shardman.database
(string)
Name of the database that all Silk workers connect to.
Default: postgres.
shardman.monitor_interval
(integer)
shardman.monitor_interval is deprecated and acts as noop.
Use shardman.monitor_dxact_interval instead.
shardman.monitor_dxact_interval
(integer)
Interval between checks for outdated prepared transactions.
The shardman monitor background process wakes up every shardman.monitor_dxact_interval
seconds and attempts to check and resolve any prepared transactions that did
not complete and became outdated for some reason. To resolve these transactions,
the shardman monitor process determines the coordinator of
the transaction and requests the transaction status from the coordinator. Based
on the status of the transaction, shardman monitor will
either roll back or commit the transaction.
To disable the prepared transaction resolution logic, set shardman.monitor_dxact_interval to 0.
Default: 5 (seconds).
shardman.monitor_trim_csnxid_map_interval
(integer)
Each cluster node freezes its own xmin value for csn_snapshot_defer_time seconds to support global transactions.
Large csn_snapshot_defer_time values can negatively impact the performance. shardman monitor
has a routine that every shardman.monitor_trim_csnxid_map_interval seconds updates xmin on all
nodes to the minimum possible value (taking into account active transactions).
The background routine will run on only one node in the Shardman cluster. Note that this will give an additional load on this node.
To disable such updates, set shardman.monitor_trim_csnxid_map_interval to 0.
Default: 5 (seconds).
shardman.monitor_dxact_timeout
(integer)
Maximum allowed age of prepared transactions before a resolution attempt.
During the resolution of a prepared transaction,
shardman monitor determines whether the transaction is outdated or not.
A transaction becomes outdated if it was prepared more than shardman.monitor_dxact_timeout seconds ago.
Default: 5 (seconds).
shardman.trim_csnxid_map_naptime
(integer)
Specifies the minimum delay between xmin updates on all nodes. See shardman.monitor_trim_csnxid_map_interval for more information.
Possible values are from 1 to 600.
Default: 5.
shardman.monitor_deadlock_interval
(integer)
Interval between checks for distributed deadlock conditions.
The shardman monitor background process wakes up every shardman.monitor_deadlock_interval
seconds and searches for distributed deadlocks in the cluster. It gathers information about
mutual locks from all nodes and looks for circular dependencies between transactions. If it
detects a deadlock, it resolves it by canceling one of the backend processes involved in the lock.
To disable the distributed deadlock resolution logic, set shardman.monitor_deadlock_interval to 0.
Default: 2 (seconds).
This is the contents of an example sdmspec.json configuration file:
{
"LadleSpec":{
"PGsInitialPort": 5432,
"Repfactor": 1,
"PlacementPolicy": "cross"
},
"ClusterSpec":{
"PgSuAuthMethod": "scram-sha-256",
"PgSuUsername":"postgres",
"PgSuPassword": "ChangeMe",
"PgReplUsername":"repluser",
"PgReplAuthMethod":"scram-sha-256",
"PgReplPassword":"ChangeMe",
"FDWOptions" :{
"async_capable": "true",
"batch_size": "100",
"fetch_size": "50000",
"fdw_tuple_cost": "0.2"
},
"StolonSpec":{
"pgHBA":[
"host replication postgres 0.0.0.0/0 scram-sha-256",
"host replication postgres ::0/0 scram-sha-256",
"host all all 0.0.0.0/0 scram-sha-256"
],
"pgParameters":{
"log_statement":"none",
"log_line_prefix":"%m [%r][%p]",
"log_min_messages":"INFO",
"wal_level":"logical",
"max_worker_processes":"16",
"max_logical_replication_workers": "9",
"shared_preload_libraries":"postgres_fdw, shardman",
"default_transaction_isolation":"repeatable read",
"max_prepared_transactions":"200",
"csn_snapshot_defer_time":"15",
"enable_csn_snapshot":"true",
"postgres_fdw.estimate_as_hashjoin":"on",
"enable_partitionwise_aggregate":"on",
"enable_partitionwise_join":"on",
"postgres_fdw.enforce_foreign_join":"on",
"postgres_fdw.optimize_cursors":"on",
"postgres_fdw.estimate_indexscan":"on",
"postgres_fdw.use_twophase":"on",
}
}
}
}
From LadleSpec, you can see that
a Shardman cluster initialized with this spec file
has Repfactor equal to 1 (one replica for each master).
ClusterSpec shows that two special
users are created in this cluster — superuser postgres
and replication user repluser with
ChangeMe passwords. They can be authenticated using the
md5 or scram-sha-256 authorization method. One postgres_fdw
fetch operation will get up to 50000 rows from the remote server. The cost of fetching
one row is set to a reasonably high value to make PostgreSQL
planner consider conditions pushdown-attractive. pg_hba.conf settings
allow postgres user access from anywhere using a replication
protocol; all other users can access any database from anywhere. Since
defaultSUReplAccessMode is not set to
strict, stolon will automatically add
entries that allow PgSuUsername
user's (postgres) access to any database from anywhere and
PgReplUsername user's (repluser)
replication access from anywhere.
Several important Shardman-specific parameters
are set in the pgParameters hash table. These are:
wal_level
Should be set to logical for
Shardman to work correctly.
shared_preload_librariesShould include postgres_fdw and shardman extensions in the specified order.
max_logical_replication_workers
Should be rather high since the rebalance process uses up to
max(
concurrent threads.
max_replication_slots,
max_logical_replication_workers, max_worker_processes,
max_wal_senders)/3
max_prepared_transactions
Should be rather high since Shardman utilities
use the 2PC protocol. If postgres_fdw.use_twophase
is true, postgres_fdw also uses 2PC.
enable_csn_snapshotShould be enabled to achieve a true repeatable read isolation level in a distributed system.
csn_snapshot_defer_time
All global transactions must start on all participant nodes within
csn_snapshot_defer_time seconds after start,
otherwise they will be aborted.
enable_partitionwise_aggregateenable_partitionwise_join
Set to on to enable optimizations for partitioned tables.
pgpro_stats parameterspgpro_stats.pgss_max_nodes_tracked (integer)
Sets the maximum number of nodes that are tracked by pgpro_stats for query fragments.
Actually sets maximum amount of status entries that pgpro_stats is possible to store for
pgpro_stats_lup function. Doesn't affects statistics tracking itself.
The default is 2048.
pgpro_stats.transport_compression (string)
Sets algorithm for transport compression during statistics transferring between nodes.
Transport compression uses (if specified) to compress statistic entries passed from shard nodes to the coordinator. Can be pglz (always available), zlib,
lz4, zstd (only if compiled with Shardman)
or off.
The default is pglz.