Assume that we have three nodes for deploying Postgres Pro Shardman. Let’s use the first one for the one-node etcd cluster and the other two nodes for the Postgres Pro Shardman two-node cluster.
Suppose that we have the following node names and IP addresses:
10.1.1.10 etcd - etcd node
10.1.1.20 sdm01 - Shardman node1
10.1.1.21 sdm02 - Shardman node2
Each node has 4Gb RAM, 20GB HDD, 2CPU and Ubuntu 22.04 installed.
/etc/hostsThis step must be performed on all nodes.
sudo /bin/sh -c 'cat << EOF >> /etc/hosts
10.1.1.10 etcd
10.1.1.20 sdm01
10.1.1.21 sdm02
EOF'
This step must be performed on all nodes.
Deploy and start chrony daemon on all hosts.
sudo apt update
sudo apt install -y chrony
sudo systemctl enable --now chrony
Check that chrony is working properly.
chronyc tracking
Expected output:
Reference ID : C0248F82 (Time100.Stupi.SE)
Stratum : 2
Ref time (UTC) : Tue Apr 18 11:50:44 2023
System time : 0.000019457 seconds slow of NTP time
Last offset : -0.000005579 seconds
RMS offset : 0.000089375 seconds
Frequency : 30.777 ppm fast
Residual freq : -0.000 ppm
Skew : 0.003 ppm
Root delay : 0.018349268 seconds
Root dispersion : 0.000334640 seconds
Update interval : 1039.1 seconds
Leap status : Normal
Set up the following packages:
sudo apt install -y vim wget curl iputils-ping
We use etcd API version 3, the Ubuntu 22.04 repository contains etcd, but an earlier version that is not good enough to make Shardman work.
Get latest etcd release from github, download, unpack and move to the /usr/local/bin folder.
export RELEASE=$(curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest | grep tag_name | cut -d '"' -f 4)
wget https://github.com/etcd-io/etcd/releases/download/${RELEASE}/etcd-${RELEASE}-linux-amd64.tar.gz
tar xvf etcd-${RELEASE}-linux-amd64.tar.gz
cd etcd-${RELEASE}-linux-amd64
sudo mv etcd etcdctl etcdutl /usr/local/bin
sudo groupadd --system etcd
sudo useradd -s /sbin/nologin --system -g etcd etcd
sudo mkdir -p /var/lib/etcd/
sudo mkdir -p /etc/etcd
sudo chown -R etcd:etcd /var/lib/etcd /etc/etcd
Prepare systemd unit file for etcd.
sudo /bin/sh -c 'cat << EOF > /etc/systemd/system/etcd.service
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd --name=etcd \
--data-dir=/var/lib/etcd \
--initial-advertise-peer-urls=http://10.1.1.10:2380 \
--listen-peer-urls=http://0.0.0.0:2380 \
--advertise-client-urls=http://10.1.1.10:2379 \
--listen-client-urls=http://0.0.0.0:2379 \
--initial-cluster=etcd=http://etcd:2380
Restart=always
RestartSec=10s
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
EOF'
Start the etcd cluster:
sudo systemctl daemon-reload
sudo systemctl enable --now etcd
sudo systemctl status etcd
Check that the etcd cluster is working properly:
ss -tunelp | grep 2379
tcp LISTEN 0 4096 *:2379 *:* uid:998 ino:25597 sk:1003 cgroup:/system.slice/etcd.service v6only:0 <->
etcdctl member list
9324a99282752a09, started, etcd, http://etcd:2380, http://10.1.1.10:2379, false
etcdctl --endpoints=http://10.1.1.10:2379 endpoint health
http://10.1.1.10:2379 is healthy: successfully committed proposal: took = 2.359542ms
etcdctl put key test
OK
etcdctl get key
key
test
etcdctl del key
1
One-node etcd cluster is ready to serve requests.
etcd is a database that uses MVCC to store data, with all features inherent to that model. Shardman uses etcd quite extensively to store and monitor keys. To keep the etcd database from growing too much, you should set up some restrictions. Shardman always uses the latest key revision, hence there is no need to keep the history of revisions. The database compression and defragmentation procedure should be run from time to time. If something fails, etcd uses a WAL file to reconstruct its last state, so you should keep a number of snapshots and WAL to recover nodes in case of emergency. You should also set the database size limit so that the database does not become too large. The following configuration options for etcd are recommended:
max-snapshots: 5
max-wals: 5
quota-backend-bytes: 6442450944
auto-compaction-mode: revision
auto-compaction-retention: 5
The systemd unit file will look like this:
sudo /bin/sh -c 'cat << EOF > /etc/systemd/system/etcd.service
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd --name=etcd --data-dir=/var/lib/etcd/ \
--initial-advertise-peer-urls=http://10.1.1.10:2380 \
--listen-peer-urls=http://0.0.0.0:2380 \
--advertise-client-urls=http://10.1.1.10:2379 \
--listen-client-urls=http://0.0.0.0:2379 \
--initial-cluster=etcd=http://etcd:2380 \
--max-snapshots=5 \
--max-wals=5 \
--quota-backend-bytes=6442450944 \
--auto-compaction-mode=revision \
--auto-compaction-retention=5
Restart=always
RestartSec=10s
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
EOF'
Let’s add a Postgres Pro public Shardman repository on each node:
sudo /bin/sh -c 'printf "deb [arch=amd64] http://repo.postgrespro.ru/pgprosm-14/ubuntu/ jammy main" > /etc/apt/sources.list.d/shardman.list'
sudo curl -fsSL http://repo.postgrespro.ru/pgprosm-14/keys/GPG-KEY-POSTGRESPRO | sudo apt-key add -
Next step is installation of packages (on each node):
sudo apt update
sudo apt install -y postgrespro-sdm-14-server postgrespro-sdm-14-client postgrespro-sdm-14-contrib postgrespro-sdm-14-libs pg-probackup-sdm-14 shardman-services shardman-tools
Shardman executables will be installed into /opt/pgpro/sdm-14/bin folder, and Shardman
must use the etcd cluster to store configuration, let’s set some environment variables to simplify cluster start process (on each node):
export PATH=$PATH:/opt/pgpro/sdm-14/bin
export SDM_STORE_ENDPOINTS=http://10.1.1.10:2379
export SDM_CLUSTER_NAME=cluster0
export SDM_LOG_LEVEL=debug
Setting these environment variables is equivalent to using shardmanctl utility options
(i.e., --log-level=debug, --cluster-name=cluster0, etc.) In the following examples we use environment variables.
Suppose we have chosen the cluster name of cluster0. Next step needed to be executed is to specify Shardman
environment variables in the /etcd/shardman folder (on each node):
sudo mkdir -p /etc/shardman
sudo sh -c 'cat << EOF > /etc/shardman/shardmand-cluster0.env
SDM_CLUSTER_NAME=cluster0
SDM_LOG_LEVEL=info
SDM_STORE_ENDPOINTS=http://10.1.1.10:2379
EOF'
Let’s generate a sample configuration with Shardman utilities (only on one node).
shardmanctl config generate > spec.json
In this step, you can make some changes to the cluster specification (configuration), i.e., change the password or PostgreSQL shared_buffer parameters and so on.
Now we have some final steps. First, let's initialize the cluster configuration in etcd (only on one [any] node).
shardmanctl init -f spec.json
Expected output is:
2023-04-18T12:30:03.043Z DEBUG cmd/common.go:100 Waiting for metadata lock...
2023-04-18T12:30:03.048Z DEBUG cluster/cluster.go:365 DataDir is not specified, setting to default /var/lib/pgpro/sdm-14/data
Enable and start the shardmand service (on each node):
sudo systemctl enable --now shardmand@cluster0
sudo systemctl status shardmand@cluster0
● shardmand@cluster0.service - deployment daemon for shardman
Loaded: loaded (/lib/systemd/system/shardmand@.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-04-18 12:28:18 UTC; 2min 13s ago
Docs: https://github.com/postgrespro/shardman
Main PID: 618 (shardmand)
Tasks: 10 (limit: 4571)
Memory: 32.0M
CPU: 422ms
CGroup: /system.slice/system-shardmand.slice/shardmand@cluster0.service
└─618 /opt/pgpro/sdm-14/bin/shardmand --cluster-name cluster0 --system-bus --user postgres
In this step, we assume that all previous steps were executed successfully:
the etcd cluster is working properly, times on all hosts are synchronized,
on sdm01 and sdm02, shardmand daemon is launched.
The final step should be executed as postgres Linux user:
sudo su - postgres
/opt/pgpro/sdm-14/bin/shardmanctl nodes add -n sdm01,sdm02 \
--cluster-name cluster0 \
--log-level debug \
--store-endpoints=http://10.1.1.10:2379
Expected output should be:
2023-04-18T12:43:11.300Z DEBUG cmd/common.go:100 Waiting for metadata lock...
2023-04-18T12:43:11.306Z INFO cluster/store.go:277 Checking if shardmand on all nodes have applied current cluster configuration
✓ Waiting for shardmand on node sdm01 to apply current configuration: success 0.000s
✓ Waiting for shardmand on node sdm02 to apply current configuration: success 0.000s
2023-04-18T12:43:11.307Z INFO add/case.go:112 Initting Stolon instances...
2023-04-18T12:43:11.312Z INFO add/case.go:170 Waiting for Stolon daemons to start... make sure shardmand daemons are running on the nodes
✓ Waiting for Stolon daemons of rg clover-1-sdm01: success 31.012s
✓ Waiting for Stolon daemons of rg clover-1-sdm02: success 0.012s
2023-04-18T12:43:42.336Z INFO add/case.go:187 Adding repgroups...
✓ waiting rg 1 config apply: done 7.014s
2023-04-18T12:43:49.444Z DEBUG broadcaster/worker.go:33 start broadcaster worker for repgroup id=1
2023-04-18T12:43:49.453Z DEBUG broadcaster/worker.go:51 repgroup 1 connect established
2023-04-18T12:43:49.453Z DEBUG commands/addrepgroup.go:575 waiting for extension lock...
2023-04-18T12:43:49.453Z DEBUG commands/addrepgroup.go:137 Loading schema into replication group rg 1
...
2023-04-18T12:44:25.665Z DEBUG rebalance/service.go:528 wait all tasks finish
2023-04-18T12:44:25.666Z DEBUG broadcaster/worker.go:75 finish broadcaster worker for repgroup id=1
2023-04-18T12:44:25.666Z DEBUG broadcaster/worker.go:75 finish broadcaster worker for repgroup id=2
2023-04-18T12:44:25.666Z INFO add/case.go:221 Successfully added nodes sdm01, sdm02 to the cluster
The message “Successfully added nodes sdm01, sdm02 to the cluster” means that everything is fine
and nodes sdm01 and sdm02 are working properly.
Let's check the status of cluster nodes
shardmanctl status
┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == STORE STATUS == │
├───────────┬──────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ Warning │ Store has only one member, consider │ │ │
│ │ deploying store cluster │ │ │
└───────────┴──────────────────────────────────────────┴─────────────────────┴─────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == TOPOLOGY STATUS == │
├───────────┬─────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ CROSS │ Topology placement policy is CROSS │ │ │
└───────────┴─────────────────────────────────────────┴─────────────────────┴─────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == METADATA STATUS == │
├───────────┬─────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Metadata is OK │ │ │
└───────────┴─────────────────────────────────────────┴─────────────────────┴─────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == SHARDMAND STATUS == │
├───────────┬─────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ shardmand on node sdm01 is OK │ │ sdm01 │
├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ shardmand on node sdm02 is OK │ │ sdm02 │
└───────────┴─────────────────────────────────────────┴─────────────────────┴─────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == REPLICATION GROUP STATUS == │
├───────────┬─────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Replication group clover-1-sdm01 is OK │ clover-1-sdm01 │ │
├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Replication group clover-1-sdm02 is OK │ clover-1-sdm02 │ │
└───────────┴─────────────────────────────────────────┴─────────────────────┴─────────────────────┘
┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == MASTER STATUS == │
├───────────┬──────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Replication group clover-1-sdm01 master │ clover-1-sdm01 │ sdm01:5432 │
│ │ is running on sdm01:5432 │ │ │
├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Replication group clover-1-sdm02 master │ clover-1-sdm02 │ sdm02:5432 │
│ │ is running on sdm02:5432 │ │ │
└───────────┴──────────────────────────────────────────┴─────────────────────┴─────────────────────┘
┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == DICTIONARY STATUS == │
├───────────┬──────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Replication group clover-1-sdm01 │ clover-1-sdm01 │ │
│ │ dictionary is OK │ │ │
├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Replication group clover-1-sdm02 │ clover-1-sdm02 │ │
│ │ dictionary is OK │ │ │
└───────────┴──────────────────────────────────────────┴─────────────────────┴─────────────────────┘
To connect to the cluster, we should get the cluster connection string on any cluster node (sdm01 or sdm02):
shardmanctl getconnstr
dbname=postgres host=sdm01,sdm02 password=!!!CHANGE_ME!!! port=5432,5432 user=postgres
And then let’s try to connect:
psql -d 'dbname=postgres host=sdm01,sdm02 password=!!!CHANGE_ME!!! port=5432,5432 user=postgres'
psql (14.7)
Type "help" for help.
postgres=#
Let's try to create a distributed table and check if everything is working properly.
postgres=# create table x(id int primary key, t text) with (distributed_by='id',num_parts=2);
CREATE TABLE
postgres=# \d
List of relations
Schema | Name | Type | Owner
--------+---------+-------------------+----------
public | x | partitioned table | postgres
public | x_0 | table | postgres
public | x_1_fdw | foreign table | postgres
(3 rows)
postgres=# \d x_0
Table "public.x_0"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | not null |
t | text | | |
Partition of: x FOR VALUES WITH (modulus 2, remainder 0)
Indexes:
"x_0_pkey" PRIMARY KEY, btree (id)
postgres=# \d x_1_fdw
Foreign table "public.x_1_fdw"
Column | Type | Collation | Nullable | Default | FDW options
--------+---------+-----------+----------+---------+-------------
id | integer | | not null | |
t | text | | | |
Partition of: x FOR VALUES WITH (modulus 2, remainder 1)
Server: shardman_rg_2
FDW options: (table_name 'x_1')
postgres=# insert into x values (6,'t'),(9,'t'),(11,'t');
INSERT 0 3
postgres=# select * from x_0;
id | t
----+---
1 | t
2 | t
(2 rows)
postgres=# select * from x_1_fdw;
id | t
----+---
3 | t
(1 row)
Everything works as expected.