Shardman is composed of several software components:
PostgreSQL fork.
Modified stolon PostgreSQL manager.
Shardman extension.
Management utilities.
The Shardman configuration daemon shardman-bowl monitors the cluster configuration and manages stolon clusters, which are used to ensure high availability of shards. Both Shardman and stolon store their configuration data in etcd. So, to configure Shardman, first, deploy etcd. Perform the steps described in the next sections to get a working Shardman cluster.
After performing those steps, you will be able to use shardman-ladle and shardmanctl tools to manage the cluster, as well as the Shardman extension API to configure sharded and global tables.
Currently the Shardman packages are available for Ubuntu 20.04LTS, Red Hat Enterprize Linux 7 and 8. Packages for more Linux distributions are coming in next releases. The Shardman configuration daemon heavily relies on systemd and will not work without it.
The installation procedure described in this section is simplified. It does not cover restricting the access to the etcd cluster (which is critical in a production environment). The missing procedures are described in Section 2.6.
The installation is described on a cluster consisting of four nodes:
n1, n2, n3 and
n4 with IP addresses 192.168.1.1,
192.168.1.2, 192.168.1.3,
192.168.1.4. Each node must be able to resolve all
names (n1-n4) to the IP
addresses of the corresponding nodes. If a command in this tutorial
must be run as root, it has prefix "#", otherwise - "$".
In this tutorial etcd cluster is deployed on the first three nodes
of Shardman cluster
(n1-n3). In production
environment it is recommended to deploy etcd cluster
on dedicated servers (so that high DBMS load wouldn't have
impact on etcd performance).
The Shardman cluster needs etcd in order to keep consensus. While etcd can share Shardman nodes, this is not recommended on highload because etcd uses disk intensively and calls fsync frequently that affects performance.
It is recommended that you use the etcd version that comes with Shardman since Shardman was tested thoroughly with this particular version.
First of all connect Shardman repository to your OS both on etcd and Shardman nodes.
Then install etcd-sdm package from the connected repository
on etcd nodes.
After installation, stop the etcd daemon and remove its data directory. You must then reinitialize it in the cluster mode.
#systemctl stop etcd-sdm#rm -fr /var/lib/etcd-sdm/member
Modify /etc/default/etcd-sdm so that it looks like:
# unqualified first name ETCD_NAME=NODE_NAME # where we actually listen for peers ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380 # where we actually listen for clients ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379 # advertise where this machine is listening for clients ETCD_ADVERTISE_CLIENT_URLS=http://NODE_IP:2379 # --initial flags are used during bootstrapping and ignored afterwards, so it is # ok to specify them always # advertise where this machine is listening for peer ETCD_INITIAL_ADVERTISE_PEER_URLS=http://NODE_IP:2380 ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1 ETCD_INITIAL_CLUSTER=n1=http://192.168.1.1:2380,\ n2=http://192.168.1.2:2380,n3=http://192.168.1.3:2380 ETCD_INITIAL_CLUSTER_STATE=new ETCD_DATA_DIR=/var/lib/etcd/default ETCD_AUTO_COMPACTION_RETENTION=1
Replace NODE_NAME and NODE_IP here with the name and IP of the
corresponding node
(n1 - n3 and
192.168.1.1 - 192.168.1.3 in this
example). Note that without the ETCD_AUTO_COMPACTION_RETENTION option,
etcd does not clean up the old key version, which leads to exhausting
the storage size limit (2GB by default).
Now you can start the cluster by simultaneously running this command on all nodes:
#systemctl start etcd-sdm
Check the cluster status with the etcdctl member list command.
It must show all members:
2336dfd25b20b001, started, n1, http://n1:2380, http://n1:2379
7d9de73b740c2907, started, n2, http://n2:2380, http://n2:2379
a1990623ad44419c, started, n3, http://n3:2380, http://n3:2379
All the software can be installed from the Shardman
repository.
Install shardman-services package
on Shardman nodes using a package manager from your OS.
all other required packages will be installed as dependencies.
Since all executables will be installed into
/opt/pgpro/sdm-14/bin, we advice you to add this directory to your
PATH.
$echo 'export PATH=/opt/pgpro/sdm-14/bin:$PATH' >> ~/.profile$export PATH=/opt/pgpro/sdm-14/bin:$PATH
Add /opt/pgpro/sdm-14/bin to PATH variable in
/etc/profile if you want all users to be able to use
Shardman utils.
Current Shardman packages assume that one
Shardman cluster, named cluster0, is
configured. Cluster configuration is stored in etcd. To tell the
Shardman configuration daemon
shardman-bowl how to connect to etcd,
edit the /var/lib/pgpro/sdm-14/data/bowl.env
file to set the SDM_STORE_ENDPOINTS variable on all
nodes. In this example, this file will contain:
SDM_STORE_ENDPOINTS=http://n1:2379,http://n2:2379,http://n3:2379
Now restart shardman-bowl services on all nodes to update their configuration:
#systemctl restart shardman-bowl.service
Create the initialization file for the Shardman cluster.
You can use the /opt/pgpro/sdm-14/examples/sdmspec.json file as
a template. Copy it somewhere and edit as necessary. You will need to change
PgSuPassword, PgReplPassword and
most likely, the pgHBA configuration. Note that
the StolonSpec object is the stolon
configuration file.
Additional parameters specify the paths to different software components,
such as the paths to stolon or
PostgreSQL binaries, and must match your
specific installation. (See sdmspec.json
file reference for more information).
After creating the initialization file, pass the etcd connection data and path to this file to shardman-ladle to initialize the cluster (shardman-ladle manages the whole cluster, so its commands are global):
$shardman-ladle --store-endpoints http://n1:2379,http://n2:2379,http://n3:2379 init -f sdmspec.json
Here all store endpoints should be specified explicitly, as this information is
not only used to connect to the etcd cluster, but is also stored in etcd
and later used by Shardman
services. In subsequent commands you can omit this parameter if shardman-ladle
is run on one of the etcd cluster nodes (n1 - n3 in this
case).
Use shardman-ladle to add a few nodes to the
cluster. With the default replication factor (1, which
means that each master has exactly one replica), add nodes
in pairs, one pair after another. The master and its replicas make up
a replication group.
$shardman-ladle addnodes -n n1,n2,n3,n4
The tool will modify the cluster configuration and wait for shardman-bowl to launch the necessary services.
If nodes are added successfully, you will be able to connect to any node in the cluster as user specified in the initialization file.
To get the libpq connection string, to be used to connect to the
cluster as administrator, use shardmanctl getconnstr
command:
$shardmanctl getconnstrdbname=postgres host=n2,n1,n4,n3,n3,n4,n1,n2 password=ChangeMe port=5432,5433,5432,5433,5432,5433,5432,5433 user=postgres
You can connect to the Shardman cluster using this connection string.
Query the shardman.repgroups table to check whether the replication group
you are connected to knows about other replication groups.
$psql -d 'dbname=postgres host=n2,n1,n4,n3,n3,n4,n1,n2 password=ChangeMe port=5432,5433,5432,5433,5432,5433,5432,5433 user=postgres'postgres=#select * from shardman.repgroups ;id | srvid ----+------- 2 | 1 | 16510 3 | 16512 4 | 16514 (4 rows)
From this output, you can see that there are 4 replication groups in the
cluster and we are connected to the second one
(srvid is NULL for id 2).
For other replication groups, srvid is the OID of
the foreign server from the pg_foreign_server table
corresponding to this replication group.
Now we can create a sharded table:
CREATE TABLE pgbench_branches (
bid integer NOT NULL PRIMARY KEY,
bbalance integer,
filler character(88)
)
WITH (distributed_by = 'bid', num_parts = 8);
We can see that the sharded table is a partitioned table with 8 partitions. Part of them are regular tables (r), and part are foreign tables (f) for the current DBMS instance. Note that 8 partitions can appear insufficient for a production cluster.
postgres=#\d+ pgbench_branchesPartitioned table "public.pgbench_branches" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ----------+---------------+-----------+----------+---------+----------+--------------+------------- bid | integer | | not null | | plain | | bbalance | integer | | | | plain | | filler | character(88) | | | | extended | | Partition key: HASH (bid) Indexes: "pgbench_branches_pkey" PRIMARY KEY, btree (bid) Partitions: pgbench_branches_0 FOR VALUES WITH (modulus 8, remainder 0), pgbench_branches_1_fdw FOR VALUES WITH (modulus 8, remainder 1), pgbench_branches_2 FOR VALUES WITH (modulus 8, remainder 2), pgbench_branches_3_fdw FOR VALUES WITH (modulus 8, remainder 3), pgbench_branches_4 FOR VALUES WITH (modulus 8, remainder 4), pgbench_branches_5_fdw FOR VALUES WITH (modulus 8, remainder 5), pgbench_branches_6 FOR VALUES WITH (modulus 8, remainder 6), pgbench_branches_7_fdw FOR VALUES WITH (modulus 8, remainder 7)postgres=#select relname, relkind from pg_class where relname like 'pgbench_branches%' and relkind in ('f', 'r') order by relname;relname | relkind ------------------------+--------- pgbench_branches_0 | r pgbench_branches_1_fdw | f pgbench_branches_2 | r pgbench_branches_3_fdw | f pgbench_branches_4 | r pgbench_branches_5_fdw | f pgbench_branches_6 | r pgbench_branches_7_fdw | f (8 rows)