Assume that we have three nodes for deployment PostgresPro Shardman. Let’s make the first one for the Etcd one-node cluster, and another two nodes for the PostgresPro Shardman two-nodes cluster.
Let’s suppose that we have the following nodes names and ip addresses:
10.1.1.10 etcd - etcd node
10.1.1.20 sdm01 - Shardman node1
10.1.1.21 sdm02 - Shardman node2
Each node has 4Gb RAM, 20GB HDD, 2CPU, Ubuntu 22.04 installed.
/etc/hostsThis step must be performed on all nodes.
sudo /bin/sh -c 'cat << EOF >> /etc/hosts
10.1.1.10 etcd
10.1.1.20 sdm01
10.1.1.21 sdm02
EOF'
This step must be performed on all nodes.
Deploy and start chrony daemon on all hosts.
sudo apt update
sudo apt install -y chrony
sudo systemctl enable --now chrony
Check that chrony is working properly.
chronyc tracking
Expected output:
Reference ID : C0248F82 (Time100.Stupi.SE)
Stratum : 2
Ref time (UTC) : Tue Apr 18 11:50:44 2023
System time : 0.000019457 seconds slow of NTP time
Last offset : -0.000005579 seconds
RMS offset : 0.000089375 seconds
Frequency : 30.777 ppm fast
Residual freq : -0.000 ppm
Skew : 0.003 ppm
Root delay : 0.018349268 seconds
Root dispersion : 0.000334640 seconds
Update interval : 1039.1 seconds
Leap status : Normal
Setup next packages:
sudo apt install -y vim wget curl iputils-ping
We use etcd API version 3, the Ubuntu 22.04 repository contains etcd, but an earlier version that is not quite good enough to make Shardman work.
Get latest etcd release from github, download, unpack and move to the /usr/local/bin folder.
export RELEASE=$(curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest | grep tag_name | cut -d '"' -f 4)
wget https://github.com/etcd-io/etcd/releases/download/${RELEASE}/etcd-${RELEASE}-linux-amd64.tar.gz
tar xvf etcd-${RELEASE}-linux-amd64.tar.gz
cd etcd-${RELEASE}-linux-amd64
sudo mv etcd etcdctl etcdutl /usr/local/bin
sudo groupadd --system etcd
sudo useradd -s /sbin/nologin --system -g etcd etcd
sudo mkdir -p /var/lib/etcd/
sudo mkdir -p /etc/etcd
sudo chown -R etcd:etcd /var/lib/etcd /etc/etcd
Prepare systemd unit file for etcd.
sudo /bin/sh -c 'cat << EOF > /etc/systemd/system/etcd.service
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd --name=etcd \
--data-dir=/var/lib/etcd \
--initial-advertise-peer-urls=http://10.1.1.10:2380 \
--listen-peer-urls=http://0.0.0.0:2380 \
--advertise-client-urls=http://10.1.1.10:2379 \
--listen-client-urls=http://0.0.0.0:2379 \
--initial-cluster=etcd=http://etcd:2380
Restart=always
RestartSec=10s
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
EOF'
Start Etcd cluster:
sudo systemctl daemon-reload
sudo systemctl enable --now etcd
sudo systemctl status etcd
Check Etcd cluster is working properly:
ss -tunelp | grep 2379
tcp LISTEN 0 4096 *:2379 *:* uid:998 ino:25597 sk:1003 cgroup:/system.slice/etcd.service v6only:0 <->
etcdctl member list
9324a99282752a09, started, etcd, http://etcd:2380, http://10.1.1.10:2379, false
etcdctl --endpoints=http://10.1.1.10:2379 endpoint health
http://10.1.1.10:2379 is healthy: successfully committed proposal: took = 2.359542ms
etcdctl put key test
OK
etcdctl get key
key
test
etcdctl del key
1
One-node etcd cluster is ready to serve requests.
Etcd is a database that uses MVCC to store data, with all features inherent from that model. Shardman uses etcd quite extensively to store and monitor keys. In order to keep the etcd database from growing in size you should set up some restrictions. Shardman always uses the latest key revision, hence there is no need to keep a history of revisions. The database compression and defragmentation procedure should be run from time to time. If something fails, etcd uses a wal file to reconstruct its last state, so you should keep a number of snapshots and wal to recover nodes in case of emergency. You should also set a database size limit so that the database does not grow too large. Recommended are the following configuration options for etcd:
max-snapshots: 5
max-wals: 5
quota-backend-bytes: 6442450944
auto-compaction-mode: revision
auto-compaction-retention: 5
The systemd unit file will look like this:
sudo /bin/sh -c 'cat << EOF > /etc/systemd/system/etcd.service
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd --name=etcd --data-dir=/var/lib/etcd/ \
--initial-advertise-peer-urls=http://10.1.1.10:2380 \
--listen-peer-urls=http://0.0.0.0:2380 \
--advertise-client-urls=http://10.1.1.10:2379 \
--listen-client-urls=http://0.0.0.0:2379 \
--initial-cluster=etcd=http://etcd:2380 \
--max-snapshots=5 \
--max-wals=5 \
--quota-backend-bytes=6442450944 \
--auto-compaction-mode=revision \
--auto-compaction-retention=5
Restart=always
RestartSec=10s
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
EOF'
Let’s add PostgresPro public Shardman repo on each node:
sudo /bin/sh -c 'printf "deb [arch=amd64] http://repo.postgrespro.ru/pgprosm-14/ubuntu/ jammy main" > /etc/apt/sources.list.d/shardman.list'
sudo curl -fsSL http://repo.postgrespro.ru/pgprosm-14/keys/GPG-KEY-POSTGRESPRO | sudo apt-key add -
Next step is packages installation (on each node):
sudo apt update
sudo apt install -y postgrespro-sdm-14-server postgrespro-sdm-14-client postgrespro-sdm-14-contrib postgrespro-sdm-14-libs pg-probackup-sdm-14 shardman-services shardman-tools
Shardman executables will be installed into /opt/pgpro/sdm-14/bin folder, and Shardman must use Etcd cluster to store configuration, let’s set some environment variables to simplify cluster start process (on each node):
export PATH=$PATH:/opt/pgpro/sdm-14/bin
export SDM_STORE_ENDPOINTS=http://10.1.1.10:2379
export SDM_CLUSTER_NAME=cluster0
export SDM_LOG_LEVEL=debug
Setting these environment variables is equivalent to using shardmanctl utility keys (i.e. --log-level=debug,--cluster-name=cluster0, etc). In the following examples we use environment variables.
Suppose we have chosen a cluster name - cluster0. Next step is needed to be executed is to put Shardman environment vars into the /etcd/shardman folder (on each node):
sudo mkdir -p /etc/shardman
sudo sh -c 'cat << EOF > /etc/shardman/shardmand-cluster0.env
SDM_CLUSTER_NAME=cluster0
SDM_LOG_LEVEL=info
SDM_STORE_ENDPOINTS=http://10.1.1.10:2379
EOF'
Let’s generate sample configuration with the Shardman utils (only on one node).
shardmanctl config generate > spec.json
At this step you can make some changes in cluster specification (configuration), i.e. password or PostgreSQL shared_buffer parameters and so on.
Now we have some final steps. Firs, lets init cluster configuration into Etcd (only on one[any] node).
shardmanctl init -f spec.json
Expected output is:
2023-04-18T12:30:03.043Z DEBUG cmd/common.go:100 Waiting for metadata lock...
2023-04-18T12:30:03.048Z DEBUG cluster/cluster.go:365 DataDir is not specified, setting to default /var/lib/pgpro/sdm-14/data
Enable and start shardmand service (on each node):
sudo systemctl enable --now shardmand@cluster0
sudo systemctl status shardmand@cluster0
● shardmand@cluster0.service - deployment daemon for shardman
Loaded: loaded (/lib/systemd/system/shardmand@.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-04-18 12:28:18 UTC; 2min 13s ago
Docs: https://github.com/postgrespro/shardman
Main PID: 618 (shardmand)
Tasks: 10 (limit: 4571)
Memory: 32.0M
CPU: 422ms
CGroup: /system.slice/system-shardmand.slice/shardmand@cluster0.service
└─618 /opt/pgpro/sdm-14/bin/shardmand --cluster-name cluster0 --system-bus --user postgres
At this step we assume that all previous steps were executed successfully: Etcd cluster is working properly, time on all hosts are synchronized, on sdm01 and sdm02 shardmand daemon is launched. The final step should be executed under postgres linux user:
sudo su - postgres
/opt/pgpro/sdm-14/bin/shardmanctl nodes add -n sdm01,sdm02 \
--cluster-name cluster0 \
--log-level debug \
--store-endpoints=http://10.1.1.10:2379
Expected output should be:
2023-04-18T12:43:11.300Z DEBUG cmd/common.go:100 Waiting for metadata lock...
2023-04-18T12:43:11.306Z INFO cluster/store.go:277 Checking if shardmand on all nodes have applied current cluster configuration
✓ Waiting for shardmand on node sdm01 to apply current configuration: success 0.000s
✓ Waiting for shardmand on node sdm02 to apply current configuration: success 0.000s
2023-04-18T12:43:11.307Z INFO add/case.go:112 Initting Stolon instances...
2023-04-18T12:43:11.312Z INFO add/case.go:170 Waiting for Stolon daemons to start... make sure shardmand daemons are running on the nodes
✓ Waiting for Stolon daemons of rg clover-1-sdm01: success 31.012s
✓ Waiting for Stolon daemons of rg clover-1-sdm02: success 0.012s
2023-04-18T12:43:42.336Z INFO add/case.go:187 Adding repgroups...
✓ waiting rg 1 config apply: done 7.014s
2023-04-18T12:43:49.444Z DEBUG broadcaster/worker.go:33 start broadcaster worker for repgroup id=1
2023-04-18T12:43:49.453Z DEBUG broadcaster/worker.go:51 repgroup 1 connect established
2023-04-18T12:43:49.453Z DEBUG commands/addrepgroup.go:575 waiting for extension lock...
2023-04-18T12:43:49.453Z DEBUG commands/addrepgroup.go:137 Loading schema into replication group rg 1
...
2023-04-18T12:44:25.665Z DEBUG rebalance/service.go:528 wait all tasks finish
2023-04-18T12:44:25.666Z DEBUG broadcaster/worker.go:75 finish broadcaster worker for repgroup id=1
2023-04-18T12:44:25.666Z DEBUG broadcaster/worker.go:75 finish broadcaster worker for repgroup id=2
2023-04-18T12:44:25.666Z INFO add/case.go:221 Successfully added nodes sdm01, sdm02 to the cluster
Message - Successfully added nodes sdm01, sdm02 to the cluster, means that everything is fine, and nodes sdm01 and sdm02 are working properly.
Let's check the status of the cluster nodes
shardmanctl status
┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == STORE STATUS == │
├───────────┬──────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ Warning │ Store has only one member, consider │ │ │
│ │ deploying store cluster │ │ │
└───────────┴──────────────────────────────────────────┴─────────────────────┴─────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == TOPOLOGY STATUS == │
├───────────┬─────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ CROSS │ Topology placement policy is CROSS │ │ │
└───────────┴─────────────────────────────────────────┴─────────────────────┴─────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == METADATA STATUS == │
├───────────┬─────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Metadata is OK │ │ │
└───────────┴─────────────────────────────────────────┴─────────────────────┴─────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == SHARDMAND STATUS == │
├───────────┬─────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ shardmand on node sdm01 is OK │ │ sdm01 │
├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ shardmand on node sdm02 is OK │ │ sdm02 │
└───────────┴─────────────────────────────────────────┴─────────────────────┴─────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == REPLICATION GROUP STATUS == │
├───────────┬─────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Replication group clover-1-sdm01 is OK │ clover-1-sdm01 │ │
├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Replication group clover-1-sdm02 is OK │ clover-1-sdm02 │ │
└───────────┴─────────────────────────────────────────┴─────────────────────┴─────────────────────┘
┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == MASTER STATUS == │
├───────────┬──────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Replication group clover-1-sdm01 master │ clover-1-sdm01 │ sdm01:5432 │
│ │ is running on sdm01:5432 │ │ │
├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Replication group clover-1-sdm02 master │ clover-1-sdm02 │ sdm02:5432 │
│ │ is running on sdm02:5432 │ │ │
└───────────┴──────────────────────────────────────────┴─────────────────────┴─────────────────────┘
┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
│ == DICTIONARY STATUS == │
├───────────┬──────────────────────────────────────────┬─────────────────────┬─────────────────────┤
│ STATUS │ MESSAGE │ REPLICATION GROUP │ NODE │
├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Replication group clover-1-sdm01 │ clover-1-sdm01 │ │
│ │ dictionary is OK │ │ │
├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
│ OK │ Replication group clover-1-sdm02 │ clover-1-sdm02 │ │
│ │ dictionary is OK │ │ │
└───────────┴──────────────────────────────────────────┴─────────────────────┴─────────────────────┘
To connect to the cluster we should get cluster connection string at any cluster node (sdm01 or sdm02):
shardmanctl getconnstr
dbname=postgres host=sdm01,sdm02 password=!!!CHANGE_ME!!! port=5432,5432 user=postgres
And then let’s try to connect:
psql -d 'dbname=postgres host=sdm01,sdm02 password=!!!CHANGE_ME!!! port=5432,5432 user=postgres'
psql (14.7)
Type "help" for help.
postgres=#
Let's try to create a distributed table and check if everything is working properly.
postgres=# create table x(id int primary key, t text) with (distributed_by='id',num_parts=2);
CREATE TABLE
postgres=# \d
List of relations
Schema | Name | Type | Owner
--------+---------+-------------------+----------
public | x | partitioned table | postgres
public | x_0 | table | postgres
public | x_1_fdw | foreign table | postgres
(3 rows)
postgres=# \d x_0
Table "public.x_0"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | not null |
t | text | | |
Partition of: x FOR VALUES WITH (modulus 2, remainder 0)
Indexes:
"x_0_pkey" PRIMARY KEY, btree (id)
postgres=# \d x_1_fdw
Foreign table "public.x_1_fdw"
Column | Type | Collation | Nullable | Default | FDW options
--------+---------+-----------+----------+---------+-------------
id | integer | | not null | |
t | text | | | |
Partition of: x FOR VALUES WITH (modulus 2, remainder 1)
Server: shardman_rg_2
FDW options: (table_name 'x_1')
postgres=# insert into x values (6,'t'),(9,'t'),(11,'t');
INSERT 0 3
postgres=# select * from x_0;
id | t
----+---
1 | t
2 | t
(2 rows)
postgres=# select * from x_1_fdw;
id | t
----+---
3 | t
(1 row)
Everything works as expected.