18.20. Distributed Cluster Configuration

18.20. Distributed Cluster Configuration
Prev	Up	Chapter 18. Server Configuration	Home	Next

18.20. Distributed Cluster Configuration #

18.20.1. List of Parameters
18.20.2. ShardSpec Parameters
18.20.3. PostgreSQL Settings for a Distributed System
18.20.4. Examples

shardmanctl uses the sdmspec.json configuration file during Postgres Pro Shardman cluster initialization. A shardman-utils package provides a sample configuration file.

sdmspec.json file contains basic filesystem paths used by Postgres Pro Shardman, global settings of the cluster, database-related settings, i. e., administrative and replication user logins and authentication method, FDW parameters and shard configuration (ShardSpec).

Note that there is a number of the internal Postgres Pro Shardman parameters that, if modified by user, can result in the total cluster failure. These parameters are:

shardman.cluster_uuid defines the version of a running cluster that the node belongs to.
shardman.config_uuid defines the config version. Ignored if set via shardmanctl config update or shardmanctl init.
shardman.manual_execution controls the consistent work with the global objects.
shardman.silk_never_restart prohibits the multiplexer workers restart in case of an error.
shardman.pre_promote_mode applies the consistent promotion mechanism (from standby to primary).

18.20.1. List of Parameters #

Repfactor #

Integer determining how many replicas shardmanctl should configure for each DBMS. This setting can only be changed for a Postgres Pro Shardman cluster with a manual-topology mode.

DataDir #

Allows you to specify a directory other than the default one (/var/lib/pgpro/sdm-17/data) for storing data. This parameter cannot be changed after the cluster has been initialized.

PGsInitialPort #

Ports starting with this integer are assigned to PostgeSQL instances. This parameter cannot be changed after the cluster has been initialized.

SilkInitialPort #

Ports starting with this integer are assigned to Silk (Postgres Pro Shardman InterLinK) instances. This parameter cannot be changed after the cluster has been initialized.

AuthMethod #

Authentication method used by the administrative user to connect to the DBMS. Can be any authentication method supported by PostgreSQL. scram-sha-256 is currently recommended. md5 is currently allowed but not recommended. This parameter cannot be changed after the cluster has been initialized. Located under a separate Users block for each array element.

Default: trust.

Groups #

An array that can have two possible values, su for superuser or repl for replication.

HTTP #

Defines settings for the secure HTTP/HTTPS connection, with Port being an API port, and PortMetrics being a port for the metrics. If these ports are the same, then API and metrics listen to the same port.

Default: 15432.

Name #

Name of the user. Created on cluster initialization. Defaults to the name of the effective user running shardmanctl init. This parameter cannot be changed after the cluster has been initialized. Located under a separate Users block for each array element.

Password #

Password for the user. Can be changed using shardmanctl config update credentials. Located under a separate Users block for each array element.

PgSuSSLCert #

Client certificate for the administrative DBMS user.

PgSSLRootCert #

Location of the root certificate file for the DBMS user connection.

PgSuSSLKey #

Client private key for the administrative DBMS user.

PgSSLMode #

SSL mode for the DBMS user. Allowed values: verify-ca and verify-full.

PgReplSSLCert #

Client certificate for the replication DBMS user.

PgReplSSLKey #

Client private key for the replication DBMS user.

ShardSpec #

Shard cluster specification. For more details, see ShardSpec Parameters. Can be changed using shardmanctl config update.

FDWOptions #

This object contains FDW settings.

These settings can be changed using shardmanctl config update (with the exception of settings related to authorization, server connection, SSL and Kerberos, as well as the service, target_session_attrs options).

Foreign servers corresponding to Postgres Pro Shardman replication groups will also get extended_features setting automatically enabled. Never set this parameter for postgres_fdw foreign servers which you define for your own purposes (for example, to load data into Postgres Pro Shardman cluster).

18.20.2. `ShardSpec` Parameters #

The ShardSpec specification lists a set of parameters that can be tuned for a specific replication group.

pgHBA #

JSON array of pg_hba.conf strings. The default value allows user from the su group access from anywhere with AuthMethod authentication method. If the value of defaultSUReplAccessMode is strict, pg_hba.conf strings must explicitly allow users from the groups su or repl access from all Postgres Pro Shardman cluster nodes.

Cluster software does not fail even if writing to pg_hba.conf is forbidden.

forceSuUserLocalPeerAuth #

When enabled, it sets a peer authentication via unix socket for the postgres user, if strictUserHBA is not set to true.

Default: false.

maxSynchronousStandbys #

Maximum number of required synchronous standbys when synchronous replication is enabled. Should be >= Repfactor in a Postgres Pro Shardman cluster. Default: Repfactor.

strictUserHBA #

Prohibits adding automatically generated lines to pg_hba.conf file. Default: false.

automaticPgRestart #

Determines whether a DBMS instance should be automatically restarted after a change of the pgParameters hash table that requires a restart. Should be enabled in a Postgres Pro Shardman cluster.

Default: true.

masterDemotionEnabled #

Enable master demotion in case the replica group master has lost connectivity with etcd. The master attempts to connect to each of its standby nodes to determine if any of them has become the master. If it discovers another master, it shuts down its own DBMS instance until the connectivity with etcd is restored. If the master fails to connect to one of its standby nodes for a long time, a DBMS instance shutdown occurs.

Default: false.

masterDemotionTimeout #

The timeout during which the master attempts to connect to its standbys in cases where connectivity with etcd is lost. Works only if the masterDemotionEnabled parameter is set to true.

Default: 30s.

minSyncMonitorEnabled #

Enable the monitor for the MinSynchronousStandbys value for every replica group. If a node loses connection with the cluster (all keepers are unhealthy: a keeper does not update its state longer than minSyncMonitorUnhealthyTimeout), the monitor decreases the MinSynchronousStandbys value for every replica group related to the disconnected node to the maximum available value. This allows preventing the read-only condition caused by the fake replica. The maximum available value is always less than or equal to the value specified in the cluster configuration. If all keepers related to the disconnected node become healthy, the monitor changes MinSynchronousStandbys value for the replica group to the value specified in the cluster configuration.

Default: false.

minSyncMonitorUnhealthyTimeout #

Time interval after which the node (and all keepers related to this node) will be considered in an unhealthy condition. Works only if the minSyncMonitorEnabled parameter is set to true.

Default: 30s.

syncPointMonitorEnabled #

Enable the monitor that creates a syncpoint every minute, ensuring the Postgres Pro Shardman can restore to a consistent LSN. At each syncpoint, the cluster's state is consistent, meaning that all transactions are complete. If this parameter is set to true, PITR will be guaranteed to work. syncPointMonitorEnabled saves the syncpoint history in etcd with the shardman/{cluster_name}/data/cluster/syncpoints key.

Default: false.

syncPointMonitorLimit #

Specifies the limit for the size of the syncpoint history saved by syncPointMonitorEnabled.

Default: 60.

syncpointInterval #

Specifies how often a syncpoint is created (seconds).

Default: 60 seconds.

syncpointEnabled #

Enables monitor that periodically creates syncpoints.

Default: true.

syncpointLimit #

Amount of the most recent syncpoints stored.

Default: 60.

shardmanSchemaDumpMonitorDisabled #

Disables the shardmanSchemaDumpMonitorDisabled monitor that periodically creates a dump of the shardman schema.

Default: true.

shardmanSchemaDumpMonitorTick #

Sets how often the shardmanSchemaDumpMonitor monitor creates a dump.

Default: every 20 minutes.

pgParameters #

Hash table that determines PostgreSQL settings, including settings for a distributed system.

biHAReplicaWithProbackup #

If enabled, new BiHA followers and referee-with-wal will use probackup as synchronization method.

18.20.3. PostgreSQL Settings for a Distributed System #

The following settings in pgParameters are specific for a distributed system:

enable_sql_func_custom_plans (boolean) #

If enabled, custom plans can be created to execute statements inside SQL functions. These plans depend on the parameter values.

Query plans can be cached within one query. First, the plan is built five times with different parameter values, then a generic plan is created regardless of the values. If custom and generic plan price is slightly different, then the generic plan is cached and is set to be used in the future. However, custom plans allow a more effective way of excluding queries to the sharded table partitions if the choice of these partitions depends on the query option.

Default: off.

enable_merge_append (boolean) #

Enables the use of MergeAppend plans by the query planner.

Default: on.

enable_async_merge_append (boolean) #

Enables or disables the query planner's use of async-aware merge append plan types. The default is on.

csn_snapshot_defer_time (integer) #

Specifies the minimal age of records that are allowed to be vacuumed, in seconds.

For distributed system specifics, see csn_snapshot_defer_time.

All global transactions must start on all participant nodes within csn_snapshot_defer_time seconds after start, otherwise, they are aborted with a “csn snapshot too old” error.

Default: 15.

csn_commit_delay (integer) #

Specifies the maximum possible clock skew (in nanoseconds) in the cluster. Adds a delay before every commit in the system to ensure external consistency. If set to 0, external consistency is not guaranteed. Value suffixes ns, us, ms and s are allowed.

Default: 0.

csn_lsn_map_size (integer) #

Size of CSNLSNMap.

The commit record of each completed transaction in Postgres Pro Shardman contains the assigned CSN for this transaction. This value, together with the LSNof this record, forms a pair of values (CSN, LSN). Each of the cluster nodes stores a certain number of such pairs in RAM in a special structure - the CSNLSNMap. This map is used to get the syncpoint. See the “Syncpoints and Consistent Backup” section of the Internals chapter for more information.

Default: 1024.

csn_max_shift_error (boolean) #

When checked against the csn_max_shift value, raises an error if the csn_max_shift value is exceeded.

Default: off.

csn_max_shift (integer) #

Maximum CSN shift in seconds for distributed queries and imported snapshots. If the shift exceeds the csn_max_shift value, an error or warning will occur. If the value is set to 0, no check is run.

Default: 15 seconds.

foreign_analyze_interval (integer) #

Specifies how often foreign statistics should be gathered during autovacuum, in seconds. If the value of foreign_analyze_interval is less than autovacuum_naptime, foreign statistics will be gathered each autovacuum_naptime seconds.

Default: 60.

foreign_join_fast_path (boolean) #

Turns on a fast path for foreign join planning. When it is on, foreign join paths for SELECT queries are searched before all other possible paths and the search stops for a join as soon as a foreign join path is found.

Default: off.

port (integer) #

A TCP port the server listens on. For a Postgres Pro Shardman cluster, the port is assigned automatically by the system and is based on the PGsInitialPort parameter. If changed manually, the value will be overwritten by the configuration parameter that is automatically assigned.

enable_partition_pruning_extra (boolean) #

Enables the extended partition pruning for the prepared queries with a known partitioning key. If turned on, the partition-wise join plans can be pruned.

Default: off.

postgres_fdw.enforce_foreign_join (boolean) #

Turns on alternative estimations for foreign join costs, which highly increases chances for join of several foreign tables referring to the same server to be pushed down. The cost of original join is estimated as (1 - 1/(cost + 1)), where cost is an originally estimated cost for this remote join.

Default: off.

postgres_fdw.foreign_explain (enum) #

Defines how to include the EXPLAIN command output from the remote servers if the query plan contains ForeignScan nodes. The possible values are: none to exclude the EXPLAIN output from the remote servers, full to include the EXPLAIN output from the remote servers, collapsed to include the EXPLAIN output only for the first ForeignScan node under its Append/MergeAppend.

Default: collapsed.

postgres_fdw.use_twophase (enum) #

Sets postgres_fdw to use the two-phase commit (2PC) protocol for distributed transactions.

This postgres_fdw parameter forces it to use a two-phase commit if the transaction touches several nodes. When set to auto, a two-phase commit is only used in transactions with enable_csn_snapshot=true and isolation level equal to or higher than REPEATABLE READ.

Temporary tables cannot be used in 2PC transactions.

Default: auto.

postgres_fdw.additional_ordered_paths (boolean) #

When enabled, sorting on the remote server is considered if it allows performing MergeJoin or MergeAppend operations. This parameter is enabled by default in new installations but must be explicitly enabled in upgraded clusters.

shardman.broadcast_ddl (boolean) #

Sets Postgres Pro Shardman extension to broadcast DDL statements to all replication groups.

When this parameter is on, Postgres Pro Shardman extension broadcasts supported DDL statements to all replication groups if it does make sense for those statements. You can enable/disable this behavior anytime. This parameter is not honored when set in configuration file. Note that it does not affect sequences, as they are only created per shard.

Default: off.

shardman.enable_limit_pushdown (boolean) #

Enable pushing down limit clauses through the underlying appends. When on, Postgres Pro Shardman optimizer will try to push down a limit clause to the subpaths of the underlying Append/MergeAppend plan node if they reference postgres_fdw foreign tables. This optimization works only for SELECT plans when limit option is represented as a constant or a parameter. It is also restricted for Append paths, corresponding to a partitioned table.

Default: on.

shardman.num_parts (integer) #

Specifies the default number of sharded table partitions.

A sharded table has this default number of partitions unless num_parts is specified in CREATE TABLE.

To allow scaling, shardman.num_parts should be larger than the expected maximum number of nodes in a Postgres Pro Shardman cluster.

Possible values are from 1 to 1000.

Default: 20.

shardman.rgid (integer) #

Specifies the replication group ID of a Postgres Pro Shardman node.

This parameter is set by Postgres Pro Shardman utilities when the node is added to the cluster and should never be changed manually.

Default: -1.

shardman.sync_schema (boolean) #

Sets Postgres Pro Shardman to propagate all DDL statements that touch sharded and global relations to all replication groups.

When this parameter is on, Postgres Pro Shardman broadcasts all supported utility statements touching sharded and global relations to all replication groups. It is not recommended to turn this off. This parameter is not honored when set in configuration file. It can only be modified by admins.

Default: on.

shardman.sync_cluster_settings (boolean) #

Enables cluster-wide synchronization of configuration parameters set by user. The configuration parameters are propagated with each remote query.

Default: on.

shardman.sync_cluster_settings_blacklist (boolean) #

Excludes the options not to be propagated to a remote cluster.

Default: local system configuration parameters that are never synchronized.

shardman.query_engine_mode (enum) #

Switches between modes of query planning/execution. Possible values are none and text.

none means that query planning/execution will not use the Silk transport.

text means that the text query representation is transferred via Silk transport for remote execution.

Default: none.

shardman.silk_use_ip (string) #

Silk transport uses IP address specified by this parameter for node identification. If the host name is specified, it is resolved and the first IP address corresponding to this name, is used.

Default: node hostname.

shardman.silk_listen_ip (string) #

The Silk routing daemon listens for incoming connections on this IP address. If the host name is specified, it is resolved and the first IP address corresponding to this name, is used.

Default: node hostname.

shardman.silk_use_port (integer) #

The Silk routing daemon listens for incoming connections on this port. This setting should be the same for all nodes in the Postgres Pro Shardman cluster.

Default: 8888.

shardman.silk_tracepoints (bool) #

Enables tracing of queries passing through the Silk pipeline. The tracing results can be accessed by running the EXPLAIN command with ANALYZE set to ON.

Default: off.

shardman.silk_num_workers (integer) #

Number of background workers allocated for distributed execution. This setting must be less than max_worker_processes (including auxilary postgres worker processes).

Default: 2.

shardman.silk_stream_work_mem (integer) #

Sets the base maximum amount of memory to be used by a Silk stream (as a buffer size) before writing to the temporary disk files. If this value is specified without units, the default is kilobytes.

Note that most queries can perform multiple fetch operations at the same time, usually one for each remote partition of a sharded table, if any. Each fetch operation is generaly allowed to use as much memory as this value specifies before it starts to write data into temporary files. Also, several running sessions can execute such operations concurrently. Therefore, the total memory used by Silk for buffers could be many times the value of shardman.silk_stream_work_mem and is correlated with shardman.num_parts. Thus, mind this fact when choosing the value.

Default: 16MB.

shardman.silkworm_fetch_size (integer) #

Number of rows in a chunk that the silkworm worker extracts and sends to the multiplexer as a result, per one reading iteration.

Default: 100.

shardman.silk_unassigned_job_queue_size (integer) #

Size of queue for jobs that have not yet been assigned to the silkworm multiplexer workers, in case all the workers are busy.

Default: 1024.

shardman.silk_max_message (integer) #

Maximum message size that can be transfered with Silk, in bytes. Note that this parameter does not limit the maximum size of the result returned by the query. It only affects messages sent to workers. Increasing this parameter value will result in a proportional memory increase consumed by Postgres Pro Shardman. It is strongly recommended to use the default value unless there is an urgent need.

Default: 524288.

shardman.silk_hello_timeout (integer) #

Handshake timeout between multiplexers of different nodes, in seconds.

Default: 3.

shardman.silk_scheduler_mode (enum) #

Enables additional CPU scheduling settings for multiplexer processes (silkroad and silkworm).

When this parameter is fifo, Postgres Pro Shardman assigns scheduling policy SCHED_FIFO for processes silkroad and each of silkworm. It assigns the static schediling priority (sched_priority) to values shardman.silkroad_sched_priority and shardman.silkworm_sched_priority respectively.

This setting improves silk transport performance while it operates under heavy CPU load.

Note that postgres binary need to have CAP_SYS_NICE capability to use this option. If no appropriate capability was assigned to the process, enabling this setting will have no effect. The capability must be assigned to postgres binary before starting postgres. postgres (i.e. processes silkroad and silkworm) will apply scheduling options once during service start. You need restart postgres service if you want to change scheduling options.

Default: none.

To set capability you need execute following command once after postgres installed:
                $ sudo setcap cap_sys_nice+ep /opt/pgpro/sdm-17/bin/postgres
              
Replace /opt/pgpro/sdm-17/bin/postgres to the correct path to your postgres binary if needed. Also note that your filesystem should support extended file attributes. You need set this for each node in the cluster to take the full effect.

In the Linux kernel, there is a mechanism called real-time throttling, which is designed to prevent tasks with real-time scheduling policies (like SCHED_FIFO) from monopolizing CPU resources. This ensures that other tasks with lower priorities, typically scheduled under the SCHED_OTHER policy, still get some amount of the CPU time. This mechanism is controlled by two parameters, exported into the proc filesystem or the sysctl interface:
/proc/sys/kernel/sched_rt_period_us sets the duration of a scheduling period in microseconds. During this period, both real-time and non-real-time tasks share CPU time.
/proc/sys/kernel/sched_rt_runtime_us specifies how much of the scheduling period is allocated to real-time tasks (with SCHED_FIFO). The remainder of the time is left for non-real-time tasks (SCHED_OTHER).
A typical and acceptable configuration for Postgres Pro Shardman might set these parameters as follows:
    # cat /proc/sys/kernel/sched_rt_period_us
    1000000
    # cat /proc/sys/kernel/sched_rt_runtime_us
    950000
    
This configuration allows real-time tasks to use up to 950 milliseconds of each second, leaving 50 milliseconds for non-real-time tasks.
However, in some Linux distributions, the default values for these parameters might be set so low (or even to zero) that real-time tasks receive very little or no CPU time. This can make real-time scheduling ineffective or prevent the configuration from being applied. For example, attempting to manually set a task to the SCHED_FIFO priority using chrt might result in an error like:
    $ sudo chrt -f -p 2 $(pgrep -f silkroad)
    chrt: failed to set pid 1897706's policy: Operation not permitted
    
This error indicates that the kernel parameters are not configured correctly. In such cases, run the following:
    echo 1000000 > /proc/sys/kernel/sched_rt_period_us
    echo 950000 > /proc/sys/kernel/sched_rt_runtime_us
    
Or add the corresponding values into /etc/sysctl.conf and reload the settings using sysctl -p:
    kernel.sched_rt_period_us = 1000000
    kernel.sched_rt_runtime_us = 950000
    

shardman.silkroad_sched_priority (integer) #

Value of static scheduling priority (sched_priority) for silkroad process. It only makes sense if shardman.silk_scheduler_mode equals to 'fifo'.

Default: 2.

shardman.silkworm_sched_priority (integer) #

Value of static scheduling priority (sched_priority) for silkworm processes (the same value for each of them). It only makes sense if shardman.silk_scheduler_mode equals to 'fifo'.

Default: 1.

shardman.silk_set_affinity (bool) #

Enables pinning of multiplexer processes (silkroad and silkworm) to CPU cores to eliminate negative effects of thread's cross-cpu migration.

When this parameter is true, silkroad process will be pinned to the first available CPU core and silkworm processes (all of them) will pinned to all available CPU cores except the first one.

This setting improves silk transport performance while it operates under heavy CPU load.

Note that postgres binary need to have CAP_SYS_NICE capability to use this option. If no appropriate capability was assigned to the process, enabling this setting will have no effect. The capability must be assigned to postgres binary before starting postgres. postgres (i.e. processes silkroad and silkworm) will apply affinity options once during service start. You need restart postgres service if you want to change affinity options.

To set capability you need execute following command once after postgres installed:
                $ sudo setcap cap_sys_nice+ep /opt/pgpro/sdm-17/bin/postgres
              
Replace /opt/pgpro/sdm-17/bin/postgres to the correct path to your postgres binary if needed. Also note that your filesystem should support extended file attributes. You need set this for each node in the cluster to take the full effect.

Default: false.

shardman.silk_flow_control (boolean) #

Controls the mode of handling read events. It has three possible values: none, round_robin, and shortest_job_first.

The none mode means no control nor additional overhead. Yet in this case, the channel may become occupied by just one distributed query.

The round_robin mode means the events created earlier are the first ones to be processed, for each event loop. If enabled, all the backends are grouped, and the client backends are prioritized over the other.

The shortest_job_first mode means full control over the traffic. If enabled, all the backends are grouped, and the client backends are prioritized over the others, along with the workers with the least session traffic.

Default: round_robin.

shardman.silk_track_time (boolean) #

Enables or disables the metrics with prefix transferred_ and time-based metrics (with prefixes read_efd_, write_efd_, and sort_time_). If disabled, these metrics have 0 values.

Default: off.

shardman.silk_tracelog (bool) #

Enables or disables Silk logging.

Default: off.

shardman.silk_tracelog_categories (string) #

Defines the Silk message categories to be traced.

Default:streams, routing, events.

shardman.database (string) #

Name of the database that all Silk workers connect to.

Default: postgres.

shardman.monitor_interval (integer) #

shardman.monitor_interval is deprecated and acts as noop.

Use shardman.monitor_dxact_interval instead.

shardman.monitor_dxact_interval (integer) #

Interval between checks for outdated prepared transactions.

The Postgres Pro Shardman monitor background process wakes up every shardman.monitor_dxact_interval seconds and attempts to check and resolve any prepared transactions that did not complete and became outdated for some reason. To resolve these transactions, the Postgres Pro Shardman monitor process determines the coordinator of the transaction and requests the transaction status from the coordinator. Based on the status of the transaction, Postgres Pro Shardman monitor will either roll back or commit the transaction.

To disable the prepared transaction resolution logic, set shardman.monitor_dxact_interval to 0.

Default: 5 (seconds).

shardman.monitor_trim_csnxid_map_interval (integer) #

Each cluster node freezes its own xmin value for csn_snapshot_defer_time seconds to support global transactions. Large csn_snapshot_defer_time values can negatively impact the performance. Postgres Pro Shardman monitor has a routine that every shardman.monitor_trim_csnxid_map_interval seconds updates xmin on all nodes to the minimum possible value (taking into account active transactions).

The background routine will run on only one node in the Postgres Pro Shardman cluster. Note that this will give an additional load on this node.

To disable such updates, set shardman.monitor_trim_csnxid_map_interval to 0.

Default: 5 (seconds).

shardman.maintenance_user (string) #

Is automatically set during the shardmanctl init for users from the admin group, and then takes effect with shardmand on each instance by creating this user and granting it with the following privileges:

  -- Prepared xacts resolver
GRANT EXECUTE ON FUNCTION shardman.xact_status TO $uname;
GRANT pg_manage_transactions TO $uname;
-- Deadlock detector
GRANT SELECT ON TABLE shardman.lock_graph TO $uname;
GRANT EXECUTE ON FUNCTION shardman.vxid_current TO $uname;
GRANT pg_read_all_stats TO $uname;
-- CSN trimmer
GRANT EXECUTE ON FUNCTION shardman.oldest_csn_snapshot TO $uname;
GRANT EXECUTE ON FUNCTION pg_trim_csnxid_map TO $uname;

After this, deadlock detector, monitor of hanging transactions, and trimming by CSN is executed under this user.

Note that this is done automatically for the certified version and should not be tempered with unless in case of a reasonable necessity.

shardman.monitor_dxact_timeout (integer) #

Maximum allowed age of prepared transactions before a resolution attempt.

During the resolution of a prepared transaction, Postgres Pro Shardman monitor determines whether the transaction is outdated or not. A transaction becomes outdated if it was prepared more than shardman.monitor_dxact_timeout seconds ago.

Default: 5 (seconds).

shardman.trim_csnxid_map_naptime (integer) #

Specifies the minimum delay between xmin updates on all nodes. See shardman.monitor_trim_csnxid_map_interval for more information.

Possible values are from 1 to 600.

Default: 5.

shardman.monitor_deadlock_interval (integer) #

Interval between checks for distributed deadlock conditions.

The Postgres Pro Shardman monitor background process wakes up every shardman.monitor_deadlock_interval seconds and searches for distributed deadlocks in the cluster. It gathers information about mutual locks from all nodes and looks for circular dependencies between transactions. If it detects a deadlock, it resolves it by canceling one of the backend processes involved in the lock.

To disable the distributed deadlock resolution logic, set shardman.monitor_deadlock_interval to 0.

Default: 2 (seconds).

postgres_fdw.remote_plan_cache (boolean) — EXPERIMENTAL #

Enables remote plan caching for FDW queries produced by locally cached plans.

Default: off.

shardman.plan_cache_mem (integer) — EXPERIMENTAL #

Specifies how much memory per worker can be used for remote plan caches.

Default: 0 (caches are disabled).

shardman.gt_batch_size (integer) — #

Specifies the buffer size for INSERT and DELETE commands executed on a global table.

Default: 64K.

track_fdw_wait_timing (boolean) #

The statistics for the network latency (wait time) for inter-cluster operations, in milliseconds. It can be accessed by running the EXPLAIN command with the network parameter enabled.

Default: on.

track_xact_time (boolean) #

Enables or disables statistics collection for time spent on a transaction.

Default: off.

enable_non_equivalence_filters (boolean) #

Enables the optimizer to generate additional non-equivalence conditions using equivalence classes.

Default: off.

optimize_row_in_expr (boolean) #

Enables the optimizer to generate additional conditions from the IN () expression.

Default: off.

shardman.extended_password_policies_skip_users (array of string) #

Specifies the list of system users for which no exchange of information on authorization events between Postgres Pro Shardman cluster nodes takes place and which cannot be locked. They, however, can be authorized on a Postgres Pro Shardman standby.

If the list of such users has not changed since Postgres Pro Shardman installation, you do not need to specify this setting.

This list includes the hardcoded users postgres and biha_replication_user. So to avoid server connection errors, add these users to the value of this configuration parameter, for example:

shardmanctl config update '{"pgParameters": {"shardman.extended_password_policies_skip_users": "postgres,biha_replication_user,some_user"}}' -p --force -y

Note that if any of the hardcoded users was replaced by some other user during the installation, the replaced user does not need to be listed.

Default: postgres, biha_replication_user.

shardman.extended_password_policies (string) #

If enabled, all extended password policies are applied cluster-wide. E.g. profile-related counters synchronize cluster-wide, which can be useful for FAILED_LOGIN_ATTEMPTS. If disabled, each node has its own counters.

Default: off.

18.20.4. Examples #

18.20.4.1. Spec File for a Cluster with Enabled scram-sha-256 Authentication #

Note

The initial configuration file should be generated with the following command:

        shardmanctl config generate > sdmspec.json

The example below is for educational purposes only and may lack the latest updates.

This is the contents of an example sdmspec.json configuration file:

      {
        "ConfigVersion": "1",
        "Repfactor": 1,
        "PGsInitialPort": 5432,
        "SilkInitialPort": 8000,
        "BiHAInitialPort" : 25432,
        "HTTP": {
          "Port": 15432,
          "PortMetrics": 15432
        },
        "Users": [
        {
        "Name": "postgres",
        "Groups": [ "su"],
        "AuthMethod": "scram-sha-256",
        "Password": "changeMe"
        },
        {
        "Name": "biha_replication_user",
        "Groups": ["repl"],
        "AuthMethod": "scram-sha-256",
        "Password": "changeMe"
        }
      ],
        "ShardSpec": {
          "usePgrewind": true,
          "pgParameters": {
          "csn_snapshot_defer_time": "300",
          "enable_csn_snapshot": "on",
          "shardman.query_engine_mode": "text",
          "shardman.silk_num_workers": "8",
          "max_connections": "600",
          "max_files_per_process": "65535",
          "max_logical_replication_workers": "14",
          "max_prepared_transactions": "200",
          "max_worker_processes": "24",
          "shared_preload_libraries": "postgres_fdw, shardman, biha"
          },
          "pgHBA": [
          "host replication postgres 0.0.0.0/0 scram-sha-256",
          "host replication postgres ::0/0 scram-sha-256"
          ],
          "automaticPgRestart": true,
          "masterDemotionEnabled": false
        },
        "FDWOptions": {
          "async_capable": "on",
          "batch_size": "100",
          "connect_timeout": "5",
          "fdw_tuple_cost": "0.2",
          "fetch_size": "50000",
          "tcp_user_timeout": "10000"
        }
      }

From that configuration file, you can see that a Postgres Pro Shardman cluster initialized with this spec file has Repfactor equal to 1 (one replica for each master). The configuration file also shows that two special users are created in this cluster — superuser postgres and replication user repluser with ChangeMe passwords. They can be authenticated using the md5 or scram-sha-256 authorization method. One postgres_fdw fetch operation will get up to 50000 rows from the remote server. The cost of fetching one row is set to a reasonably high value to make PostgreSQL planner consider conditions pushdown-attractive. pg_hba.conf settings allow postgres user access from anywhere using a replication protocol; all other users can access any database from anywhere. Since defaultSUReplAccessMode is not set to strict, utilities will automatically add entries that allow PgSuUsername user's (postgres) access to any database from anywhere and PgReplUsername user's (repluser) replication access from anywhere.

Several important parameters for a distributed system are set in the pgParameters hash table. For regular system types see Section 18.7. These are:

wal_level #: Should be set to logical for Postgres Pro Shardman to work correctly.
shared_preload_libraries #: Should include postgres_fdw and shardman extensions in the specified order.
max_logical_replication_workers #: Should be rather high since the rebalance process uses up to max(max_replication_slots, max_logical_replication_workers, max_worker_processes, max_wal_senders)/3 concurrent threads.
max_prepared_transactions #: Should be rather high since Postgres Pro Shardman utilities use the 2PC protocol. If postgres_fdw.use_twophase is true, postgres_fdw also uses 2PC.
enable_csn_snapshot #: Should be enabled to achieve a true REPEATABLE READ isolation level in a distributed system.
csn_snapshot_defer_time #: All global transactions must start on all participant nodes within csn_snapshot_defer_time seconds after start, otherwise they will be aborted.
enable_partitionwise_aggregate enable_partitionwise_join #: Set to on to enable optimizations for partitioned tables.

18.20.4.2. Spec File for a Cluster with Enabled Certificate Authentication #

This is the contents of an example sdmspec.json configuration file:

    {
      "ConfigVersion": "1",
      "HTTP": {
        "Port": 15432,
        "PortMetrics": 15432
        "SSLKey": "/pgpro/ssl/server.key",
        "SSLCert": "/pgpro/ssl/server.crt"
      },
      "Users": [
        {
          "Name": "postgres",
          "SSLKey": "/var/lib/postgresql/.ssh/client.key",
          "SSLCert": "/var/lib/postgresql/.ssh/client.crt",
          "Groups": ["su"],
          "AuthMethod":"scram-sha-256"
        },
        {
          "Name": "biha_replication_user",
          "SSLKey": "/var/lib/postgresql/.ssh/repluser.key",
          "SSLCert": "/var/lib/postgresql/.ssh/repluser.crt",
          "Groups": ["repl"],
          "AuthMethod":"scram-sha-256"
        }
      ],
        "ShardSpec": {
        "synchronousReplication": true,
        "usePgrewind": true,
        "certVersion": true,
        "syncPointEnabled": true,
        "syncpointLimit": 60,
        "syncpointInterval": "60s",
        "pgParameters": {
          "ssl": "on",
          "ssl_cert_file": "/var/lib/postgresql/.ssh/server.crt",
          "ssl_key_file": "/var/lib/postgresql/.ssh/server.key",
          "ssl_ca_file": "/var/lib/postgresql/.ssh/ca.crt",
          "csn_snapshot_defer_time": "300",
          "enable_csn_snapshot": "on",
          "log_line_prefix": "%m [%r][%p]",
          "log_min_messages": "INFO",
          "log_statement": "none",
          "maintenance_work_mem": "1GB",
          "max_connections": "600",
          "max_files_per_process": "65535",
          "max_logical_replication_workers": "9",
          "max_prepared_transactions": "200",
          "max_wal_size": "4GB",
          "max_worker_processes": "16",
          "min_wal_size": "512MB",
          "shardman.query_engine_mode": "text",
          "shardman.silk_num_workers": "8",
          "shared_buffers": "4GB",
          "shared_preload_libraries": "postgres_fdw, shardman, biha"
        },
        "strictUserHBA": true,
        "pgHBA": [
          "hostssl all postgres 0.0.0.0/0 cert clientcert=verify-full",
          "hostssl all repluser 0.0.0.0/0 cert clientcert=verify-full",
          "hostssl replication postgres 0.0.0.0/0 cert clientcert=verify-full",
          "hostssl replication postgres ::0/0 cert clientcert=verify-full",
          "hostssl replication repluser 0.0.0.0/0 cert clientcert=verify-full",
          "hostssl replication repluser ::0/0 cert clientcert=verify-full",
          "hostnossl all all 0.0.0.0/0 reject",
          "local postgres postgres scram-sha-256",
          "local replication repluser scram-sha-256"
        ],
        "automaticPgRestart": true,
        "masterDemotionEnabled": false
        },
        "FDWOptions": {
        "async_capable": "on",
        "batch_size": "100",
        "connect_timeout": "5",
        "fdw_tuple_cost": "0.2",
        "fetch_size": "50000",
        "tcp_user_timeout": "10000"
        }
    }

Note that if the certVersion is set to true, you cannot modify pg_hba.conf.

18.20.4.3. Spec File to Create BiHA Cluster #

To create a cluster with topology specified in sdmspec.json configuration file, place it under the Topology key The topology format is a key-value pair, with key being a line with the shard name (shard_name-n, where n is a shard number), and value being the nodes array for the specified shard. Each node must contain the following information:

    {
      "host": "host",    // name or IP of a node
      "role": "leader",  // node role: leader, follower, referee, referee-with-wal
      "pgPort": 5432,    // PostgreSQL instance port
      "silkPort": 8000,  // silk port
     "bihaPort": 25432, // biha port
      "priority": 10     // node priority, a time interval after which the node can be suggested as a leader candidate, in milliseconds. The smaller the value, the greater is priority. Optional parameter.
    }

Also mind that the specified topology must comply with a number of rules:

Each shard must contain at least one leader node.
The number of shard followers must be greater than or equal to the minSynchonousStandbys value.
The host:pgPort pair must be unique for each node.
The ports can’t repeate within one node.
The node priority must be either not specidied or be less than 0. The referee or referee-with-wal instances must have no priority.

Example:

    {
    "ConfigVersion": "1",
    "Repfactor": 1,
    "Topology": {
      "shard-1": [
        {
          "host": "host1",
          "role": "leader",
          "pgPort": 5432,
          "silkPort": 8000,
          "bihaPort": 25432,
          "priority": 10
        },
        {
          "host": "host2",
          "role": "follower",
         "pgPort": 5433,
          "silkPort": 8001,
          "bihaPort": 25433,
          "priority": 1000
        },
        {
          "host": "host3",
          "role": "referee",
          "pgPort": 5432,
          "silkPort": 8000,
          "bihaPort": 25432
        }
      ],
      "shard-2": [
        {
          "host": "host2",
          "role": "leader",
          "pgPort": 5432,
          "silkPort": 8000,
          "bihaPort": 25432,
          "priority": 10
        },
        {
          "host": "host1",
          "role": "follower",
          "pgPort": 5433,
          "silkPort": 8001,
          "bihaPort": 25433,
         "priority": 1000
        },
        {
         "host": "host4",
          "role": "referee-with-wal",
          "pgPort": 5432,
          "silkPort": 8000,
         "bihaPort": 25432
        }
      ]
    },
    "ShardSpec": {
        "synchronousReplication": true,
        "minSynchronousStandbys": 0,
        ....

You can configure the biha control channel in the protected mode by creating a certificate and key pair and specifying certificate paths. In this case, add the following to your specification file:

"ShardSpec": {
    "biHAProtocol": {
    "SSLCert": "/var/lib/postgresql/.ssh/biha_pub_cert.pem",
    "SSLKey": "/var/lib/postgresql/.ssh/biha_priv_key.pem",
    "SSLCertDays": 365,
    "SSLCertDomain": "localhost"
    },
    ...
}

18.20.4.4. Spec File to Create Custom Logging #

To create custom logging, use placeholders from the shardmanctl config update and shardmanctl set commands.

Example:

    {
     "log_directory": "{{dataDir}}",
     "log_filename": "postgres-{{host}}-{{cluster}}-{{shard}}-{{keeperID}}.log",
    }

Prev	Up	Next
18.19. Developer Options	Home	18.21. Short Options