3.2. Install Shardman with Three-node etcd Cluster

3.2. Install Shardman with Three-node etcd Cluster
Prev	Up	Chapter 3. Install	Home	Next

3.2.1. Cluster Configuration
3.2.2. Preparation
3.2.3. Deploy three-node etcd Cluster
3.2.4. Deploy Shardman Nodes
3.2.5. Initialize the Shardman Cluster
3.2.6. Add Nodes to the Shardman Cluster
3.2.7. Check the Shardman Cluster Status
3.2.8. Connect to the Shardman Cluster
3.2.9. Create Distributed Tables

3.2.1. Cluster Configuration

Assume that we have five nodes for deploying Postgres Pro Shardman. Let’s use the first three of them for the etcd three-node cluster and the other two nodes for the Postgres Pro Shardman two-node cluster.

Suppose that we have the following node names and IP addresses:

 10.1.1.11 etcd1  - etcd node
 10.1.1.12 etcd2  - etcd node
 10.1.1.13 etcd3  - etcd node

 10.1.1.20 sdm01 - Shardman node1
 10.1.1.21 sdm02 - Shardman node2

Each node has 4Gb RAM, 20GB HDD, 2CPU and Ubuntu 22.04 installed.

3.2.2. Preparation

3.2.2.1. Add host names to `/etc/hosts`

This step must be performed on all nodes.

  sudo /bin/sh -c 'cat << EOF >> /etc/hosts
  10.1.1.11 etcd1
  10.1.1.12 etcd2
  10.1.1.13 etcd3
  10.1.1.20 sdm01
  10.1.1.21 sdm02
  EOF'

3.2.2.2. Time Synchronization

This step must be performed on all nodes.

Deploy and start chrony daemon on all hosts.

  sudo apt update
  sudo apt install -y chrony
  sudo systemctl enable --now chrony

Check that chrony is working properly.

  chrony tracking

Expected output:

  Reference ID    : C0248F82 (Time100.Stupi.SE)
  Stratum         : 2
  Ref time (UTC)  : Tue Apr 18 11:50:44 2023
  System time     : 0.000019457 seconds slow of NTP time
  Last offset     : -0.000005579 seconds
  RMS offset      : 0.000089375 seconds
  Frequency       : 30.777 ppm fast
  Residual freq   : -0.000 ppm
  Skew            : 0.003 ppm
  Root delay      : 0.018349268 seconds
  Root dispersion : 0.000334640 seconds
  Update interval : 1039.1 seconds
  Leap status     : Normal

3.2.3. Deploy three-node etcd Cluster

Set up the following packages on each node (etcd1, etcd2, etcd3):

  sudo apt install -y vim wget curl iputils-ping

We use etcd API version 3, the Ubuntu 22.04 repository contains etcd, but an earlier version that is not good enough to make Shardman work.

Get latest etcd release from github, download, unpack and move to the /usr/local/bin folder.

  export RELEASE=$(curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest | grep tag_name | cut -d '"' -f 4)

  wget https://github.com/etcd-io/etcd/releases/download/${RELEASE}/etcd-${RELEASE}-linux-amd64.tar.gz
  tar xvf etcd-${RELEASE}-linux-amd64.tar.gz

  cd etcd-${RELEASE}-linux-amd64
  sudo mv etcd etcdctl etcdutl /usr/local/bin

  sudo groupadd --system etcd
  sudo useradd -s /sbin/nologin --system -g etcd etcd

  sudo mkdir -p /var/lib/etcd/
  sudo mkdir -p /etc/etcd

  sudo chown -R etcd:etcd /var/lib/etcd /etc/etcd

Prepare systemd unit file for etcd.

Node etcd1:

  sudo /bin/sh -c 'cat << EOF > /etc/systemd/system/etcd.service
  [Unit]
  Description=etcd key-value store
  Documentation=https://github.com/etcd-io/etcd
  After=network.target

  [Service]
  User=etcd
  Type=notify
  ExecStart=/usr/local/bin/etcd \
                    --name=etcd1 \
                    --data-dir=/var/lib/etcd/ \
                    --initial-advertise-peer-urls=http://10.1.1.11:2380 \
                    --listen-peer-urls=http://0.0.0.0:2380 \
                    --advertise-client-urls=http://10.1.1.11:2379 \
                    --listen-client-urls=http://0.0.0.0:2379 \
                    --initial-cluster=etcd1=http://10.1.1.11:2380,etcd2=http://10.1.1.12:2380,etcd3=http://10.1.1.13:2380 \
                    --initial-cluster-token=secret-token \
                    --initial-cluster-state=new \
                    --max-snapshots=5 \
                    --max-wals=5 \
                    --quota-backend-bytes=6442450944 \
                    --auto-compaction-mode=revision \
                    --auto-compaction-retention=5
  Restart=always
  RestartSec=10s
  LimitNOFILE=40000

  [Install]
  WantedBy=multi-user.target
  EOF'

Node etcd2:

  sudo /bin/sh -c 'cat << EOF > /etc/systemd/system/etcd.service
  [Unit]
  Description=etcd key-value store
  Documentation=https://github.com/etcd-io/etcd
  After=network.target

  [Service]
  User=etcd
  Type=notify
  ExecStart=/usr/local/bin/etcd \
                    --name=etcd2 \
                    --data-dir=/var/lib/etcd/ \
                    --initial-advertise-peer-urls=http://10.1.1.12:2380 \
                    --listen-peer-urls=http://0.0.0.0:2380 \
                    --advertise-client-urls=http://10.1.1.12:2379 \
                    --listen-client-urls=http://0.0.0.0:2379 \
                    --initial-cluster=etcd1=http://10.1.1.11:2380,etcd2=http://10.1.1.12:2380,etcd3=http://10.1.1.13:2380 \
                    --initial-cluster-token=secret-token \
                    --initial-cluster-state=new \
                    --max-snapshots=5 \
                    --max-wals=5 \
                    --quota-backend-bytes=6442450944 \
                    --auto-compaction-mode=revision \
                    --auto-compaction-retention=5
  Restart=always
  RestartSec=10s
  LimitNOFILE=40000

  [Install]
  WantedBy=multi-user.target
  EOF'

Node etcd3:

  sudo /bin/sh -c 'cat << EOF > /etc/systemd/system/etcd.service
  [Unit]
  Description=etcd key-value store
  Documentation=https://github.com/etcd-io/etcd
  After=network.target

  [Service]
  User=etcd
  Type=notify
  ExecStart=/usr/local/bin/etcd \
                    --name=etcd3 \
                    --data-dir=/var/lib/etcd/ \
                    --initial-advertise-peer-urls=http://10.1.1.13:2380 \
                    --listen-peer-urls=http://0.0.0.0:2380 \
                    --advertise-client-urls=http://10.1.1.13:2379 \
                    --listen-client-urls=http://0.0.0.0:2379 \
                    --initial-cluster=etcd1=http://10.1.1.11:2380,etcd2=http://10.1.1.12:2380,etcd3=http://10.1.1.13:2380 \
                    --initial-cluster-token=secret-token \
                    --initial-cluster-state=new \
                    --max-snapshots=5 \
                    --max-wals=5 \
                    --quota-backend-bytes=6442450944 \
                    --auto-compaction-mode=revision \
                    --auto-compaction-retention=5
  Restart=always
  RestartSec=10s
  LimitNOFILE=40000

  [Install]
  WantedBy=multi-user.target
  EOF'

Start the etcd cluster

On each cluster node:

  sudo systemctl daemon-reload
  sudo systemctl enable --now etcd
  sudo systemctl status etcd

Check that the etcd cluster is working properly.

Run the following commands on any node.

ss -tunelp | grep 2379

Expected output:

tcp LISTEN 0 4096 *:2379 *:* uid:998 ino:25597 sk:1003 cgroup:/system.slice/etcd.service v6only:0 <->

etcdctl --endpoints http://10.1.1.11:2380,http://10.1.1.12:2380,http://10.1.1.13:2380 member list -w table

Expected output:

+------------------+---------+---------------+----------------------------+----------------------------+------------+
|        ID        | STATUS  |     NAME      |         PEER ADDRS         |        CLIENT ADDRS        | IS LEARNER |
+------------------+---------+---------------+----------------------------+----------------------------+------------+
| 4ea1b7505b50a919 | started |     etcd1     |   http://10.1.1.11:2380    |    http://10.1.1.11:2379   |      false |
| 709c7bac14ddefcc | started |     etcd2     |   http://10.1.1.12:2380    |    http://10.1.1.12:2379   |      false |
| bdb72e6a663cdd2a | started |     etcd3     |   http://10.1.1.13:2380    |    http://10.1.1.13:2379   |      false |
+------------------+---------+---------------+----------------------------+----------------------------+------------+

  etcdctl --endpoints=http://10.1.1.11:2380,http://10.1.1.12:2380,http://10.1.1.13:2380 endpoint health

Expected output:

+----------------------------+--------+------------+-------+
|          ENDPOINT          | HEALTH |    TOOK    | ERROR |
+----------------------------+--------+------------+-------+
|   http://10.1.1.11:2379    |   true | 2.045478ms |       |
|   http://10.1.1.12:2379    |   true | 2.260022ms |       |
|   http://10.1.1.13:2379    |   true | 2.167308ms |       |
+----------------------------+--------+------------+-------+

  etcdctl --endpoints=http://10.1.1.11:2380,http://10.1.1.12:2380,http://10.1.1.13:2380 endpoint status health

Expected output:

+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|   http://10.1.1.11:2379    | 4ea1b7505b50a919 |   3.5.8 |  799 kB |      true |      false |        14 |     794083 |             794083 |        |
|   http://10.1.1.12:2379    | bdb72e6a663cdd2a |   3.5.8 |  799 kB |     false |      false |        14 |     794083 |             794083 |        |
|   http://10.1.1.13:2379    | 709c7bac14ddefcc |   3.5.8 |  795 kB |     false |      false |        14 |     794083 |             794083 |        |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

  etcdctl put key test
  OK

  etcdctl get key
  key
  test

  etcdctl del key
  1

Three-node etcd cluster is ready to serve requests.

3.2.4. Deploy Shardman Nodes

Let’s add a Postgres Pro public Shardman repository on each node:

  sudo /bin/sh -c 'printf "deb [arch=amd64] http://repo.postgrespro.ru/pgprosm-14/ubuntu/ jammy main" > /etc/apt/sources.list.d/shardman.list'
  sudo curl -fsSL http://repo.postgrespro.ru/pgprosm-14/keys/GPG-KEY-POSTGRESPRO | sudo apt-key add -

Next step is installation of packages (on each node):

  sudo apt update
  sudo apt install -y postgrespro-sdm-14-server postgrespro-sdm-14-client postgrespro-sdm-14-contrib postgrespro-sdm-14-libs pg-probackup-sdm-14  shardman-services shardman-tools

Shardman executables will be installed into the /opt/pgpro/sdm-14/bin folder, and Shardman must use the etcd cluster to store the configuration, let’s set some environment variables to simplify cluster start process (on each node):

  export PATH=$PATH:/opt/pgpro/sdm-14/bin
  export SDM_STORE_ENDPOINTS=http://10.1.1.11:2379,http://10.1.1.12:2379,http://10.1.1.13:2379
  export SDM_CLUSTER_NAME=cluster0
  export SDM_LOG_LEVEL=debug

Setting these environment variables is equivalent to using shardmanctl utility options (i.e., --log-level=debug, --cluster-name=cluster0, etc.) In the following examples we use environment variables.

Suppose we have chosen the cluster name of cluster0. Next step needed to be executed is to specify Shardman environment variables in the /etcd/shardman folder (on each node):

  sudo mkdir -p /etc/shardman

  sudo sh -c 'cat << EOF > /etc/shardman/shardmand-cluster0.env
  SDM_CLUSTER_NAME=cluster0
  SDM_LOG_LEVEL=info
  SDM_STORE_ENDPOINTS=http://10.1.1.11:2379,http://10.1.1.12:2379,http://10.1.1.13:2379
  EOF'

Let’s generate a sample configuration with Shardman utilities (only on one node).

  shardmanctl config generate > spec.json

In this step, you can make some changes to the cluster specification (configuration), i.e., change the password or PostgreSQL shared_buffer parameters and so on.

3.2.5. Initialize the Shardman Cluster

Now we have some final steps. First, let's initialize the cluster configuration in etcd (only on one [any] node).

  shardmanctl init -f spec.json

Expected output is:

  2023-04-18T12:30:03.043Z	DEBUG	cmd/common.go:100	Waiting for metadata lock...
  2023-04-18T12:30:03.048Z	DEBUG	cluster/cluster.go:365	DataDir is not specified, setting to default /var/lib/pgpro/sdm-14/data

Enable and start the shardmand service (on each node):

  sudo systemctl enable --now shardmand@cluster0
  sudo systemctl status shardmand@cluster0

  ● shardmand@cluster0.service - deployment daemon for shardman
       Loaded: loaded (/lib/systemd/system/shardmand@.service; enabled; vendor preset: enabled)
       Active: active (running) since Tue 2023-04-18 12:28:18 UTC; 2min 13s ago
         Docs: https://github.com/postgrespro/shardman
     Main PID: 618 (shardmand)
        Tasks: 10 (limit: 4571)
       Memory: 32.0M
          CPU: 422ms
       CGroup: /system.slice/system-shardmand.slice/shardmand@cluster0.service
               └─618 /opt/pgpro/sdm-14/bin/shardmand --cluster-name cluster0 --system-bus --user postgres

3.2.6. Add Nodes to the Shardman Cluster

In this step, we assume that all previous steps were executed successfully: the etcd cluster is working properly, times on all hosts are synchronized, on sdm01 and sdm02, shardmand daemon is launched. The final step should be executed as postgres Linux user:

  sudo su - postgres

  /opt/pgpro/sdm-14/bin/shardmanctl nodes add -n sdm01,sdm02 \
             --cluster-name cluster0 \
             --log-level debug \
             --store-endpoints=http://10.1.1.11:2379,http://10.1.1.12:2379,http://10.1.1.13:2379

Expected output should be:

  2023-04-18T12:43:11.300Z	DEBUG	cmd/common.go:100	Waiting for metadata lock...
  2023-04-18T12:43:11.306Z	INFO	cluster/store.go:277	Checking if shardmand on all nodes have applied current cluster configuration
  ✓ Waiting for shardmand on node sdm01 to apply current configuration: success 0.000s
  ✓ Waiting for shardmand on node sdm02 to apply current configuration: success 0.000s
  2023-04-18T12:43:11.307Z	INFO	add/case.go:112	Initting Stolon instances...
  2023-04-18T12:43:11.312Z	INFO	add/case.go:170	Waiting for Stolon daemons to start... make sure shardmand daemons are running on the nodes
  ✓ Waiting for Stolon daemons of rg clover-1-sdm01: success 31.012s
  ✓ Waiting for Stolon daemons of rg clover-1-sdm02: success 0.012s
  2023-04-18T12:43:42.336Z	INFO	add/case.go:187	Adding repgroups...
  ✓ waiting rg 1 config apply: done 7.014s
  2023-04-18T12:43:49.444Z	DEBUG	broadcaster/worker.go:33	start broadcaster worker for repgroup id=1
  2023-04-18T12:43:49.453Z	DEBUG	broadcaster/worker.go:51	repgroup 1 connect established
  2023-04-18T12:43:49.453Z	DEBUG	commands/addrepgroup.go:575	waiting for extension lock...
  2023-04-18T12:43:49.453Z	DEBUG	commands/addrepgroup.go:137	Loading schema into replication group rg 1
  ...
  2023-04-18T12:44:25.665Z	DEBUG	rebalance/service.go:528	wait all tasks finish
  2023-04-18T12:44:25.666Z	DEBUG	broadcaster/worker.go:75	finish broadcaster worker for repgroup id=1
  2023-04-18T12:44:25.666Z	DEBUG	broadcaster/worker.go:75	finish broadcaster worker for repgroup id=2
  2023-04-18T12:44:25.666Z	INFO	add/case.go:221	Successfully added nodes sdm01, sdm02 to the cluster

The message “Successfully added nodes sdm01, sdm02 to the cluster” means that everything is fine, and nodes sdm01 and sdm02 are working properly.

3.2.7. Check the Shardman Cluster Status

Let's check the status of cluster nodes

  shardmanctl status
  ┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                        == STORE STATUS ==                                        │
  ├───────────┬──────────────────────────────────────────┬─────────────────────┬─────────────────────┤
  │   STATUS  │                  MESSAGE                 │  REPLICATION GROUP  │         NODE        │
  ├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
  │    OK     │ store is OK                              │                     │                     │
  └───────────┴──────────────────────────────────────────┴─────────────────────┴─────────────────────┘
  ┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                      == TOPOLOGY STATUS ==                                      │
  ├───────────┬─────────────────────────────────────────┬─────────────────────┬─────────────────────┤
  │   STATUS  │                 MESSAGE                 │  REPLICATION GROUP  │         NODE        │
  ├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
  │   CROSS   │ Topology placement policy is CROSS      │                     │                     │
  └───────────┴─────────────────────────────────────────┴─────────────────────┴─────────────────────┘
  ┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                      == METADATA STATUS ==                                      │
  ├───────────┬─────────────────────────────────────────┬─────────────────────┬─────────────────────┤
  │   STATUS  │                 MESSAGE                 │  REPLICATION GROUP  │         NODE        │
  ├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
  │     OK    │ Metadata is OK                          │                     │                     │
  └───────────┴─────────────────────────────────────────┴─────────────────────┴─────────────────────┘
  ┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                      == SHARDMAND STATUS ==                                     │
  ├───────────┬─────────────────────────────────────────┬─────────────────────┬─────────────────────┤
  │   STATUS  │                 MESSAGE                 │  REPLICATION GROUP  │         NODE        │
  ├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
  │     OK    │ shardmand on node sdm01 is OK           │                     │        sdm01        │
  ├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
  │     OK    │ shardmand on node sdm02 is OK           │                     │        sdm02        │
  └───────────┴─────────────────────────────────────────┴─────────────────────┴─────────────────────┘
  ┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                  == REPLICATION GROUP STATUS ==                                 │
  ├───────────┬─────────────────────────────────────────┬─────────────────────┬─────────────────────┤
  │   STATUS  │                 MESSAGE                 │  REPLICATION GROUP  │         NODE        │
  ├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
  │     OK    │ Replication group clover-1-sdm01 is OK  │    clover-1-sdm01   │                     │
  ├───────────┼─────────────────────────────────────────┼─────────────────────┼─────────────────────┤
  │     OK    │ Replication group clover-1-sdm02 is OK  │    clover-1-sdm02   │                     │
  └───────────┴─────────────────────────────────────────┴─────────────────────┴─────────────────────┘
  ┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                        == MASTER STATUS ==                                       │
  ├───────────┬──────────────────────────────────────────┬─────────────────────┬─────────────────────┤
  │   STATUS  │                  MESSAGE                 │  REPLICATION GROUP  │         NODE        │
  ├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
  │     OK    │ Replication group clover-1-sdm01 master  │    clover-1-sdm01   │      sdm01:5432     │
  │           │ is running on sdm01:5432                 │                     │                     │
  ├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
  │     OK    │ Replication group clover-1-sdm02 master  │    clover-1-sdm02   │      sdm02:5432     │
  │           │ is running on sdm02:5432                 │                     │                     │
  └───────────┴──────────────────────────────────────────┴─────────────────────┴─────────────────────┘
  ┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                      == DICTIONARY STATUS ==                                     │
  ├───────────┬──────────────────────────────────────────┬─────────────────────┬─────────────────────┤
  │   STATUS  │                  MESSAGE                 │  REPLICATION GROUP  │         NODE        │
  ├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
  │     OK    │ Replication group clover-1-sdm01         │    clover-1-sdm01   │                     │
  │           │ dictionary is OK                         │                     │                     │
  ├───────────┼──────────────────────────────────────────┼─────────────────────┼─────────────────────┤
  │     OK    │ Replication group clover-1-sdm02         │    clover-1-sdm02   │                     │
  │           │ dictionary is OK                         │                     │                     │
  └───────────┴──────────────────────────────────────────┴─────────────────────┴─────────────────────┘

3.2.8. Connect to the Shardman Cluster

To connect to the cluster, we should get the cluster connection string on any cluster node (sdm01 or sdm02):

  shardmanctl getconnstr

  dbname=postgres host=sdm01,sdm02 password=!!!CHANGE_ME!!! port=5432,5432 user=postgres

And then let’s try to connect:

  psql -d 'dbname=postgres host=sdm01,sdm02 password=!!!CHANGE_ME!!! port=5432,5432 user=postgres'

  psql (14.7)
  Type "help" for help.

  postgres=#

3.2.9. Create Distributed Tables

Let's try to create a distributed table and check if everything is working properly.

  postgres=# create table x(id int primary key, t text) with (distributed_by='id',num_parts=2);
  CREATE TABLE

  postgres=# \d
                   List of relations
    Schema |  Name   |       Type        |  Owner
   --------+---------+-------------------+----------
    public | x       | partitioned table | postgres
    public | x_0     | table             | postgres
    public | x_1_fdw | foreign table     | postgres
   (3 rows)

  postgres=# \d x_0
    Table "public.x_0"
    Column |  Type   | Collation | Nullable | Default
   --------+---------+-----------+----------+---------
    id     | integer |           | not null |
    t      | text    |           |          |
   Partition of: x FOR VALUES WITH (modulus 2, remainder 0)
   Indexes:
       "x_0_pkey" PRIMARY KEY, btree (id)

  postgres=# \d x_1_fdw
                   Foreign table "public.x_1_fdw"
   Column |  Type   | Collation | Nullable | Default | FDW options
  --------+---------+-----------+----------+---------+-------------
   id     | integer |           | not null |         |
   t      | text    |           |          |         |
  Partition of: x FOR VALUES WITH (modulus 2, remainder 1)
  Server: shardman_rg_2
  FDW options: (table_name 'x_1')

  postgres=# insert into x values (6,'t'),(9,'t'),(11,'t');
  INSERT 0 3

  postgres=# select * from x_0;
   id | t
  ----+---
    1 | t
    2 | t
  (2 rows)

  postgres=# select * from x_1_fdw;
   id | t
  ----+---
    3 | t
  (1 row)

Everything works as expected.

Prev	Up	Next
3.1. Install Shardman with a Single-node etcd Cluster	Home	Chapter 4. Manage