1.2. Installation Tutorial

1.2. Installation Tutorial
Prev	Up	Chapter 1. Getting Started with Shardman	Home	Next

1.2.1. Install and Configure etcd
1.2.2. Install PostgreSQL, stolon and Shardman
1.2.3. Configure shardman-bowl
1.2.4. Initialize the Shardman Cluster
1.2.5. Add Nodes to the Cluster

Shardman is composed of several software components:

PostgreSQL fork.
Modified stolon PostgreSQL manager.
Shardman extension.
Management utilities.

The Shardman configuration daemon shardman-bowl monitors the cluster configuration and manages stolon clusters, which are used to ensure high availability of shards. Both Shardman and stolon store their configuration data in etcd. So, to configure Shardman, first, deploy etcd. Perform the steps described in the next sections to get a working Shardman cluster.

After performing those steps, you will be able to use shardman-ladle and shardmanctl tools to manage the cluster, as well as the Shardman extension API to configure sharded and global tables.

Currently the Shardman packages are available for Ubuntu 20.04LTS, Red Hat Enterprize Linux 7 and 8. Packages for more Linux distributions are coming in next releases. The Shardman configuration daemon heavily relies on systemd and will not work without it.

The installation procedure described in this section is simplified. It does not cover restricting the access to the etcd cluster (which is critical in a production environment). The missing procedures are described in Section 2.6.

The installation is described on a cluster consisting of four nodes: n1, n2, n3 and n4 with IP addresses 192.168.1.1, 192.168.1.2, 192.168.1.3, 192.168.1.4. Each node must be able to resolve all names (n1-n4) to the IP addresses of the corresponding nodes. If a command in this tutorial must be run as root, it has prefix "#", otherwise - "$".

In this tutorial etcd cluster is deployed on the first three nodes of Shardman cluster (n1-n3). In production environment it is recommended to deploy etcd cluster on dedicated servers (so that high DBMS load wouldn't have impact on etcd performance).

1.2.1. Install and Configure etcd

The Shardman cluster needs etcd in order to keep consensus. While etcd can share Shardman nodes, this is not recommended on highload because etcd uses disk intensively and calls fsync frequently that affects performance.

It is recommended that you use the etcd version that comes with Shardman since Shardman was tested thoroughly with this particular version.

First of all connect Shardman repository to your OS both on etcd and Shardman nodes.

Then install etcd-sdm package from the connected repository on etcd nodes.

After installation, stop the etcd daemon and remove its data directory. You must then reinitialize it in the cluster mode.

# systemctl stop etcd-sdm
# rm -fr /var/lib/etcd-sdm/member

Modify /etc/default/etcd-sdm so that it looks like:

# unqualified first name
ETCD_NAME=NODE_NAME
# where we actually listen for peers
ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
# where we actually listen for clients
ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
# advertise where this machine is listening for clients
ETCD_ADVERTISE_CLIENT_URLS=http://NODE_IP:2379

# --initial flags are used during bootstrapping and ignored afterwards, so it is
# ok to specify them always
# advertise where this machine is listening for peer
ETCD_INITIAL_ADVERTISE_PEER_URLS=http://NODE_IP:2380
ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1
ETCD_INITIAL_CLUSTER=n1=http://192.168.1.1:2380,\
n2=http://192.168.1.2:2380,n3=http://192.168.1.3:2380

ETCD_INITIAL_CLUSTER_STATE=new

ETCD_DATA_DIR=/var/lib/etcd/default
ETCD_AUTO_COMPACTION_RETENTION=1

Replace NODE_NAME and NODE_IP here with the name and IP of the corresponding node (n1 - n3 and 192.168.1.1 - 192.168.1.3 in this example). Note that without the ETCD_AUTO_COMPACTION_RETENTION option, etcd does not clean up the old key version, which leads to exhausting the storage size limit (2GB by default).

Now you can start the cluster by simultaneously running this command on all nodes:

# systemctl start etcd-sdm

Check the cluster status with the etcdctl member list command. It must show all members:


2336dfd25b20b001, started, n1, http://n1:2380, http://n1:2379
7d9de73b740c2907, started, n2, http://n2:2380, http://n2:2379
a1990623ad44419c, started, n3, http://n3:2380, http://n3:2379

1.2.2. Install PostgreSQL, stolon and Shardman

All the software can be installed from the Shardman repository. Install shardman-services package on Shardman nodes using a package manager from your OS. all other required packages will be installed as dependencies.

Since all executables will be installed into /opt/pgpro/sdm-14/bin, we advice you to add this directory to your PATH.

$ echo 'export PATH=/opt/pgpro/sdm-14/bin:$PATH' >> ~/.profile
$ export PATH=/opt/pgpro/sdm-14/bin:$PATH

Add /opt/pgpro/sdm-14/bin to PATH variable in /etc/profile if you want all users to be able to use Shardman utils.

1.2.3. Configure shardman-bowl

Current Shardman packages assume that one Shardman cluster, named cluster0, is configured. Cluster configuration is stored in etcd. To tell the Shardman configuration daemon shardman-bowl how to connect to etcd, edit the /var/lib/pgpro/sdm-14/data/bowl.env file to set the SDM_STORE_ENDPOINTS variable on all nodes. In this example, this file will contain:

SDM_STORE_ENDPOINTS=http://n1:2379,http://n2:2379,http://n3:2379

Now restart shardman-bowl services on all nodes to update their configuration:

# systemctl restart shardman-bowl.service

1.2.4. Initialize the Shardman Cluster

Create the initialization file for the Shardman cluster. You can use the /opt/pgpro/sdm-14/examples/sdmspec.json file as a template. Copy it somewhere and edit as necessary. You will need to change PgSuPassword, PgReplPassword and most likely, the pgHBA configuration. Note that the StolonSpec object is the stolon configuration file. Additional parameters specify the paths to different software components, such as the paths to stolon or PostgreSQL binaries, and must match your specific installation. (See sdmspec.json file reference for more information).

After creating the initialization file, pass the etcd connection data and path to this file to shardman-ladle to initialize the cluster (shardman-ladle manages the whole cluster, so its commands are global):

$ shardman-ladle --store-endpoints http://n1:2379,http://n2:2379,http://n3:2379 init -f sdmspec.json

Here all store endpoints should be specified explicitly, as this information is not only used to connect to the etcd cluster, but is also stored in etcd and later used by Shardman services. In subsequent commands you can omit this parameter if shardman-ladle is run on one of the etcd cluster nodes (n1 - n3 in this case).

1.2.5. Add Nodes to the Cluster

Use shardman-ladle to add a few nodes to the cluster. With the default replication factor (1, which means that each master has exactly one replica), add nodes in pairs, one pair after another. The master and its replicas make up a replication group.

$ shardman-ladle addnodes -n n1,n2,n3,n4

The tool will modify the cluster configuration and wait for shardman-bowl to launch the necessary services.

If nodes are added successfully, you will be able to connect to any node in the cluster as user specified in the initialization file.

To get the libpq connection string, to be used to connect to the cluster as administrator, use shardmanctl getconnstr command:

$ shardmanctl getconnstr

dbname=postgres host=n2,n1,n4,n3,n3,n4,n1,n2 password=ChangeMe port=5432,5433,5432,5433,5432,5433,5432,5433 user=postgres

You can connect to the Shardman cluster using this connection string. Query the shardman.repgroups table to check whether the replication group you are connected to knows about other replication groups.

$ psql -d 'dbname=postgres host=n2,n1,n4,n3,n3,n4,n1,n2 password=ChangeMe port=5432,5433,5432,5433,5432,5433,5432,5433 user=postgres'

postgres=# select * from shardman.repgroups ;

 id | srvid
----+-------
  2 |
  1 | 16510
  3 | 16512
  4 | 16514
(4 rows)

From this output, you can see that there are 4 replication groups in the cluster and we are connected to the second one (srvid is NULL for id 2). For other replication groups, srvid is the OID of the foreign server from the pg_foreign_server table corresponding to this replication group.

Now we can create a sharded table:

CREATE TABLE pgbench_branches (
       bid integer NOT NULL PRIMARY KEY,
       bbalance integer,
       filler character(88)
)
WITH (distributed_by = 'bid', num_parts = 8);

We can see that the sharded table is a partitioned table with 8 partitions. Part of them are regular tables (r), and part are foreign tables (f) for the current DBMS instance. Note that 8 partitions can appear insufficient for a production cluster.

postgres=# \d+ pgbench_branches

                            Partitioned table "public.pgbench_branches"
  Column  |     Type      | Collation | Nullable | Default | Storage  | Stats target | Description
----------+---------------+-----------+----------+---------+----------+--------------+-------------
 bid      | integer       |           | not null |         | plain    |              |
 bbalance | integer       |           |          |         | plain    |              |
 filler   | character(88) |           |          |         | extended |              |
Partition key: HASH (bid)
Indexes:
    "pgbench_branches_pkey" PRIMARY KEY, btree (bid)
Partitions: pgbench_branches_0 FOR VALUES WITH (modulus 8, remainder 0),
            pgbench_branches_1_fdw FOR VALUES WITH (modulus 8, remainder 1),
            pgbench_branches_2 FOR VALUES WITH (modulus 8, remainder 2),
            pgbench_branches_3_fdw FOR VALUES WITH (modulus 8, remainder 3),
            pgbench_branches_4 FOR VALUES WITH (modulus 8, remainder 4),
            pgbench_branches_5_fdw FOR VALUES WITH (modulus 8, remainder 5),
            pgbench_branches_6 FOR VALUES WITH (modulus 8, remainder 6),
            pgbench_branches_7_fdw FOR VALUES WITH (modulus 8, remainder 7)

postgres=# select relname, relkind from pg_class  where relname like 'pgbench_branches%' and relkind in ('f', 'r') order by relname;


        relname         | relkind
------------------------+---------
 pgbench_branches_0     | r
 pgbench_branches_1_fdw | f
 pgbench_branches_2     | r
 pgbench_branches_3_fdw | f
 pgbench_branches_4     | r
 pgbench_branches_5_fdw | f
 pgbench_branches_6     | r
 pgbench_branches_7_fdw | f
(8 rows)