shardmand — Postgres Pro Shardman configuration daemon
shardmand is a Postgres Pro Shardman configuration daemon that runs on each node in a Shardman cluster and provides a key-value distributed storage for service information.
shardmand generates a boot_uuid at every startup,
monitors changes of shardman/cluster0/data/ladle and
shardman/cluster0/data/cluster keys in its distributed storage
(cluster0 is the default cluster name used by Shardman
utils), and manages Postgres Pro Shardman processes
on the node where it is running according to the configuration described in these JSON
documents.
shardmand manages integrated keepers.
On startup and when one of the monitored keys changes in the distributed storage,
shardmand reconfigures them as follows:
It calculates the expected node configuration, i. e., the list of
keepers expected to run and their configurations, from the
shardman/cluster0/data/ladle and shardman/cluster0/data/cluster
values.
It receives the list of running keepers
with their configurations from the internal process manager.
It stops processes that are not expected to run.
This can be a process that belongs to a cluster with the same name, but a different UUID, or a process whose
description is no longer present in the expected node configuration. For
keeper processes, shardmand purges
their data directory.
If a process should be running, but its settings are different from the expected ones, shardmand updates the configuration and restarts the process. If a process should be running, but it is not running, shardmand starts it.
Also, a separate thread of shardmand periodically updates
the shardman/cluster0/data/shardmand/NODENAME
key in the distributed storage with the ClusterUUID of the last cluster to which the
configuration was applied. So, before the
shardmanctl
nodes
add command tries to initialize new clusters
for a clover, the command can ensure that no alive
threads from a previous cluster configuration are left on all nodes in the clover.
Additionally, shardmand starts two HTTP servers in separate threads.
If servers ports match, a single server running both roles is started.
The first server provides the following metrics:
shardmand_healthy_keepers, shardmand_uptime,
shardmand_reconfigurations_number_total, and
shardmand_demotions_number_total. Also, the server provides the /healthz
endpoint for shardmand health-check.
The second server provides the following endponts:
/shardmand/v1/replica — returns 200
status code if a secondary instance is running on node,
500 status code if a primary instance is running on node,
/shardmand/v1/master — returns 200
status code if a primary instance is running on node,
500 status code if a secondary instance is running on node,
/shardmand/v1/referee — returns 200
status code if the only instance running on node is referee,
500 status code if the only instance running on node is
primary or secondary, 404 if more than one instance is running.
If node both primary and secondary instances are running on node /shardmand/v1/replica
and shardmand/v1/master endpoints return 404 status code.
/shardmand/v1/status — getting
information about shardmand status,
including boot_uuid.
/shardmand/v1/tables —
returns a list of information about
sharded and global tables in the following format:
[
{
"table": "schema.table",
"type": "sharded",
"distributed_by": [
{
"number": 1,
"field": "key",
"type": "integer"
}
],
"colocate_with": "schema.table",
"partitions": [
{
"shard": "shard-1",
"code": 0
}
],
"fields": [
{
"name": "key",
"type": "integer",
"nullable": false
}
]
}
]
All Postgres Pro Shardman services are managed by
shardmand@cluster0.service, so when it is started, stopped, or restarted, it
also starts, stops, or restarts all other Postgres Pro Shardman processes
(including DBMS instances).
shardmand [common_options] [
--system-bus
] [
--user
] [
user_name
--zone-name
]
Here common_options are:
[
--cluster-name
cluster_name
] [
--log-level
error
|
warn
|
info
|
debug
] [
--version
] [
-h
|
--help
] [
--log-format
]
This refsection describes shardmand-specific command-line
options.
--log-format
#
Specifies the log output format,
json or text.
The default is text.
--system-bus
#Not used. Left for compatibility. Ignored.
--zone-name
#Shows the current zone the cluster is located in. By default not specified.
--ipc-socket socket
#
Specifies the path to a socket.
Default: <tmpdir>/shardmand.<shardmand_port>.sock.
--user user_name
#Not used. Left for compatibility. Ignored.
shardmand common options are optional parameters that are not
specific to the utility. They specify
the cluster name and a few more settings. By default shardmand
uses the cluster0 cluster name. The default log level is info.
-h, --help
#Show brief usage information.
--cluster-name cluster_name
# Specifies the name for a cluster to operate on. The default is cluster0.
--data-dir directory
#
Specifies a directory for storing data. The default is
/var/lib/pgpro/sdm-18/data. The directory
must have the following permissions: 0700
or 0750.
shardmand does not start if the directory does not have the required permissions.
--log-level level
# Specifies the log verbosity. Possible values of
level are (from minimum to maximum): error,
warn, info and debug. The default is
info.
--server-host string
#
Specifies the external IP address or hostname of
shardmand. The default is
hostname.
--server-port number
#Specifies the port number for the shardmand HTTP server. The port number must be the same for all nodes. The default is 15432.
--server-metrics-port number
#
Specifies the port for the
shardmand HTTP server for
gathering metrics. This port can match the
--server-port value. The default is 15432.
--server-ssl-key string
#Specifies the private key for the shardmand HTTP server. By default, not specified.
---server-ssl-cert string
#Specifies the certificate file for client identification by the shardmand HTTP server. By default, not specified.
--monitor-port number
#Specifies the port for the shardmand HTTP server for metrics and probes. The default is 15432.
--api-port number
#Specifies the port for the shardmand HTTP API server. The default is 15432.
--version
#Show shardman-utils version information.
A shardmand service reads the environment from
/etc/shardman/shardmand-cluster0.env. The following environment variables
affect the behavior of shardmand.
The shardmand service uses the values of environment variables only if the corresponding options are not set. If none of them is specified, the corresponding default value is used.
SDM_CLUSTER_NAME
# An alternative to setting the --cluster-name option
SDM_DATA_DIR
# An alternative to setting the --data-dir option.
SDM_LOG_LEVEL
# An alternative to setting the --log-level option
SDM_SERVER_HOST
#
An alternative to setting the --server-host option
SDM_SERVER_PORT
#
An alternative to setting the --server-port option
SDM_SERVER_METRICS_PORT
#
An alternative to setting the --server-metrics-port option
SDM_SERVER_SSL_KEY
#
An alternative to setting the --server-ssl-key option
SDM_SERVER_SSL_CERT
#
An alternative to setting the --server-ssl-cert option
SDM_SYSTEM_BUS
# An alternative to setting the --system-bus option
SDM_ZONE_NAME
# An alternative to setting the --zone-name option
SDM_USER
# An alternative to setting the --user option
SDM_IPC_SOCKET
# An alternative to setting the --ipc-socket option