shardmand — Shardman configuration daemon
shardmand [common_options] [
--system-bus
] [
--user
]user_name
Here common_options are:
[
--cluster-name
cluster_name
] [
--log-level
error
|
warn
|
info
|
debug
] [
--retries
] [
retries_number--session-timeout
] [
seconds--store-endpoints
] [
store_endpoints--store-ca-file
] [
store_ca_file--store-cert-file
] [
store_cert_file--store-key
] [
client_private_key--store-timeout
] [
duration--version
] [
-h
|
--help
]
shardmand is a Shardman
configuration daemon. It runs on each node in a Shardman
cluster, subscribes for changes of shardman/cluster0/data/ladle and
shardman/cluster0/data/cluster keys in the etcd
store (cluster0 is the default cluster name used by
Shardman utils) and manages Shardman processes
on the node where it is running according to the configuration described in these JSON
documents.
shardmand manages integrated keepers and sentinels.
On startup and when one of the monitored etcd
keys changes, shardmand reconfigures them as follows:
It calculates the expected node configuration, i. e., the list of
keepers and sentinels expected to run and their configurations, from the
shardman/cluster0/data/ladle and shardman/cluster0/data/cluster
values.
It receives the list of running keepers and sentinels
with their configurations from the internal process manager.
It stops processes that are not expected to run.
This can be a process that belongs to a cluster with the same name, but a different UUID, or a process whose
description is no longer present in the expected node configuration. For
keeper processes, shardmand purges
their data directory.
If a process should be running, but its settings are different from the expected ones, shardmand updates the configuration and restarts the process. If a process should be running, but it is not running, shardmand starts it.
Also, a separate thread of shardmand periodically updates
the shardman/cluster0/data/shardmand/NODENAME etcd
key with the ClusterUUID of the last cluster to which the
configuration was applied. So, before the
shardmanctl
nodes
add command tries to initialize new stolon clusters
for a clover, the command can ensure that no alive stolon
threads from a previous cluster configuration are left on all nodes in the clover.
Additionaly shardmand starts two http servers in separate threads.
If servers ports match, a single server running both roles is started.
The first server provides following metrics: shardmand_etcd_unavailable_time_seconds,
shardmand_healthy_keepers, shardmand_sentinels, shardmand_uptime,
shardmand_etcd_errors_total, shardmand_reconfigurations_number_total
shardmand_demotions_number_total. Also server provides a /healthz
endpoint for shardmand health-check.
The second server provides the following endponts:
/shardmand/v1/replica - returns 200 status code if a secondary instance is running on node , 500 status code if a master instance is running on node,
/shardmand/v1/master - returns 200 status code if a master instance is running on node , 500 status code if a secondary instance is running on node.
If node both master and secondary instances are running on node /shardmand/v1/replica
and shardmand/v1/master endpoints return 404 status code.
/shardmand/v1/status - getting information about shardmand status.
All Shardman services are managed by
shardmand@cluster0.service, so when it is started, stopped or restarted, it
also starts, stops or restarts all other Shardman processes
(including DBMS instances).
The meaning of the command-line options is as follows:
--system-bus
Not used. Left for compatibility. Ignored.
--user user_name
Not used. Left for compatibility. Ignored.
shardmand common options are optional parameters that are not
specific to the utility. They specify etcd connection settings,
cluster name and a few more settings. By default shardmand tries
to connect to the etcd store 127.0.0.1:2379
and use the cluster0 cluster name. The default log level is info
.
-h, --help
Show brief usage information
--cluster-name cluster_name
Specifies the name for a cluster to operate on. The default is cluster0.
--log-level level
Specifies the log verbosity. Possible values of
level are (from minimum to maximum): error,
warn, info and debug. The default is
info.
--retries number
Specifies how many times shardmanctl retries a failing etcd request. If an etcd request fails, most likely, due to a connectivity issue, shardmanctl retries it the specified number of times before reporting an error. The default is 5.
--session-timeout seconds
Specifies the session timeout for shardmanctl locks. If there is no connectivity between shardmanctl and the etcd store for the specified number of seconds, the lock is released. The default is 30.
--store-endpoints string
Specifies the etcd address in the format:
http[s]://address[:port](,http[s]://address[:port])*. The default is
http://127.0.0.1:2379.
--store-ca-file string
Verify the certificate of the HTTPS-enabled etcd store server using this CA bundle
--store-cert-file string
Specifies the certificate file for client identification by the etcd store
--store-key string
Specifies the private key file for client identification by the etcd store
--store-timeout duration
Specifies the timeout for a etcd request. The default is 5 seconds.
--monitor-port number
Specifies the port for the shardmand http server for metrics and probes. The default is 15432.
--api-port number
Specifies the port for the shardmand http api server. The default is 15432.
--store-timeout duration
Specifies the timeout for a etcd request. The default is 5 seconds.
--version
Show shardman-utils version information
A shardmand service reads the environment from
/etc/shardman/shardmand-cluster0.env. The following environment variables
affect the behavior of shardmand.
SDM_CLUSTER_NAME
An alternative to setting the --cluster-name option
SDM_LOG_LEVEL
An alternative to setting the --log-level option
SDM_RETRIES
An alternative to setting the --retries option
SDM_SYSTEM_BUS
An alternative to setting the --system-bus option
SDM_STORE_ENDPOINTS
An alternative to setting the --store-endpoints option
SDM_STORE_CA_FILE
An alternative to setting the --store-ca-file option
SDM_STORE_CERT_FILE
An alternative to setting the --store-cert-file option
SDM_STORE_KEY
An alternative to setting the --store-key option
SDM_STORE_TIMEOUT
An alternative to setting the --store-timeout option
SDM_SESSION_TIMEOUT
An alternative to setting the --session-timeout option
SDM_USER
An alternative to setting the --user option
SDM_MONITOR_PORT
Specifies the port for the shardmand http server for metrics and probes. The default is 15432.
SDM_API_PORT
Specifies the port for the shardmand http api server. The default is 15432.
shardmand settings are usually specified in the
/etc/shardman/shardmand-cluster0.env file. If you want
shardmand to connect to an etcd cluster at
hosts n1-n3 using port 2379 and all
Shardman services to use the debug log level, you
can use the following env file:
SDM_STORE_ENDPOINTS=http://n1:2379,http://n2:2379,http://n3:2379 SDM_LOG_LEVEL=debug
Note that you need to restart shardmand@cluster0 service to
apply new settings from the env file.
To look at shardmand logs, you can use a
journalctl command:
$journalctl -u shardmand@cluster0.service
You can restart all Shardman services on a node using
a systemctl command:
$systemctl restart shardmand@cluster0.service