4.2. Scaling the Cluster

4.2.1. Adding and removing a node
4.2.2. Rebalancing the data

Shardman architecture allows you to scale out your cluster without any downtime. This section describes how you can add more nodes to your Shardman cluster in order to improve query performance/scalability. If Shardman cluster doesn't meet your performance expectations or storage capacity you can add new nodes to the cluster.

4.2.1. Adding and removing a node

How nodes are added to a cluster and where replicas will be located depends on a type of highly available configuration. Shardman supports two types of configurations (cross-replication mode and manual topology mode). The PlacementPolicy parameter in sdmspec.json allows you to select the cluster behavior. The parameter supports two values: cross and manual. The default is cross. Example below:

                    {
                        "LadleSpec":{
                           "PlacementPolicy": "cross",
                           "Repfactor": 1,
                           ...
                        },
                        ...
                    }
                

4.2.1.1. Cross replication

The shardmanctl nodes add command is used to add new nodes to a Shardman cluster. With cross placement policy, nodes are added to a cluster by clovers. Each node in a clover runs the primary DBMS instance and replicas of other nodes in the clover. The number of replicas is determined by the Repfactor configuration parameter. So, each clover consists of Repfactor + 1 nodes and can stand loss of Repfactor nodes. An example of creating a cluster of four nodes with Repfactor=1 and cross replication is shown below:

                        $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 init -f sdmspec.json
                        $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 nodes add -n n1,n2,n3,n4
                    

View the topology of a cluster:

                        $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 cluster topology
                    

Command output:

                        ┌─────────────────────────────────────────────────────────────────────┐
                        │             == REPLICATION GROUP clover-1-n1, RGID - 1 ==           │
                        ├─────────────────────────┬─────────────────┬─────────────────────────┤
                        │           HOST          │       PORT      │          STATUS         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n1           │       5432      │         PRIMARY         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n2           │       5433      │         STANDBY         │
                        └─────────────────────────┴─────────────────┴─────────────────────────┘
                        ┌─────────────────────────────────────────────────────────────────────┐
                        │             == REPLICATION GROUP clover-1-n2, RGID - 2 ==           │
                        ├─────────────────────────┬─────────────────┬─────────────────────────┤
                        │           HOST          │       PORT      │          STATUS         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n1           │       5433      │         STANDBY         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n2           │       5432      │         PRIMARY         │
                        └─────────────────────────┴─────────────────┴─────────────────────────┘
                        ┌─────────────────────────────────────────────────────────────────────┐
                        │             == REPLICATION GROUP clover-2-n3, RGID - 1 ==           │
                        ├─────────────────────────┬─────────────────┬─────────────────────────┤
                        │           HOST          │       PORT      │          STATUS         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n3           │       5432      │         PRIMARY         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n4           │       5433      │         STANDBY         │
                        └─────────────────────────┴─────────────────┴─────────────────────────┘
                        ┌─────────────────────────────────────────────────────────────────────┐
                        │             == REPLICATION GROUP clover-2-n4, RGID - 2 ==           │
                        ├─────────────────────────┬─────────────────┬─────────────────────────┤
                        │           HOST          │       PORT      │          STATUS         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n3           │       5433      │         STANDBY         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n4           │       5432      │         PRIMARY         │
                        └─────────────────────────┴─────────────────┴─────────────────────────┘
                    

The shardmanctl nodes rm command is used to remove nodes from a Shardman cluster. This command removes clovers containing the specified nodes from the cluster. The last clover in the cluster cannot be removed. Any data (such as partitions of sharded relations) on removed replication groups is migrated to the remaining replication groups using logical replication, and all references to the removed replication groups (including definitions of foreign servers) are removed from the metadata of the remaining replication groups. Finally, the metadata in etcd is updated.

                        $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 nodes rm -n n3
                    

View the topology of a cluster:

                        $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 cluster topology
                    

Command output:

                        ┌─────────────────────────────────────────────────────────────────────┐
                        │             == REPLICATION GROUP clover-1-n1, RGID - 1 ==           │
                        ├─────────────────────────┬─────────────────┬─────────────────────────┤
                        │           HOST          │       PORT      │          STATUS         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n1           │       5432      │         PRIMARY         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n2           │       5433      │         STANDBY         │
                        └─────────────────────────┴─────────────────┴─────────────────────────┘
                        ┌─────────────────────────────────────────────────────────────────────┐
                        │             == REPLICATION GROUP clover-1-n2, RGID - 2 ==           │
                        ├─────────────────────────┬─────────────────┬─────────────────────────┤
                        │           HOST          │       PORT      │          STATUS         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n1           │       5433      │         STANDBY         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n2           │       5432      │         PRIMARY         │
                        └─────────────────────────┴─────────────────┴─────────────────────────┘
                    

4.2.1.2. Manual topology

In manual topology mode, to add a primarу to a cluster, use the shardmanctl nodes add command, which adds the nodes list to the cluster as a primaries with separate replication group for each primary. Create cluster with three primary nodes and manual topology (PlacementPolicy=manual in sdmspec.json):

                        $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 init -f sdmspec.json
                        $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 nodes add -n n1,n2,n3
                    

To view the topology of a cluster, use the shardmanctl cluster topology command:

                        $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 cluster topology
                    

Command output:

                        ┌────────────────────────────────────────────────────────────────────────┐
                        │              == REPLICATION GROUP clover-1-n1, RGID - 1 ==             │
                        ├──────────────────────────┬──────────────────┬──────────────────────────┤
                        │           HOST           │       PORT       │          STATUS          │
                        ├──────────────────────────┼──────────────────┼──────────────────────────┤
                        │            n1            │       5432       │          PRIMARY         │
                        └──────────────────────────┴──────────────────┴──────────────────────────┘
                        ┌────────────────────────────────────────────────────────────────────────┐
                        │              == REPLICATION GROUP clover-2-n2, RGID - 2 ==             │
                        ├──────────────────────────┬──────────────────┬──────────────────────────┤
                        │           HOST           │       PORT       │          STATUS          │
                        ├──────────────────────────┼──────────────────┼──────────────────────────┤
                        │            n2            │       5432       │          PRIMARY         │
                        └──────────────────────────┴──────────────────┴──────────────────────────┘
                        ┌────────────────────────────────────────────────────────────────────────┐
                        │              == REPLICATION GROUP clover-3-n3, RGID - 3 ==             │
                        ├──────────────────────────┬──────────────────┬──────────────────────────┤
                        │           HOST           │       PORT       │          STATUS          │
                        ├──────────────────────────┼──────────────────┼──────────────────────────┤
                        │            n3            │       5432       │          PRIMARY         │
                        └──────────────────────────┴──────────────────┴──────────────────────────┘
                    

Add n4, n5, n6 nodes as replicas using the shardmanctl shard add command:

                        $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 shard --shard clover-1-n1 add -n n4
                        $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 shard --shard clover-2-n2 add -n n5
                        $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 shard --shard clover-3-n3 add -n n6
                    

As a result, we get the following cluster configuration:

                        ┌─────────────────────────────────────────────────────────────────────┐
                        │             == REPLICATION GROUP clover-1-n1, RGID - 1 ==           │
                        ├─────────────────────────┬─────────────────┬─────────────────────────┤
                        │           HOST          │       PORT      │          STATUS         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n1           │       5432      │         PRIMARY         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n4           │       5432      │         STANDBY         │
                        └─────────────────────────┴─────────────────┴─────────────────────────┘
                        ┌─────────────────────────────────────────────────────────────────────┐
                        │             == REPLICATION GROUP clover-2-n2, RGID - 2 ==           │
                        ├─────────────────────────┬─────────────────┬─────────────────────────┤
                        │           HOST          │       PORT      │          STATUS         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n2           │       5432      │         PRIMARY         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n5           │       5432      │         STANDBY         │
                        └─────────────────────────┴─────────────────┴─────────────────────────┘
                        ┌─────────────────────────────────────────────────────────────────────┐
                        │             == REPLICATION GROUP clover-3-n3, RGID - 3 ==           │
                        ├─────────────────────────┬─────────────────┬─────────────────────────┤
                        │           HOST          │       PORT      │          STATUS         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n3           │       5432      │         PRIMARY         │
                        ├─────────────────────────┼─────────────────┼─────────────────────────┤
                        │            n6           │       5432      │         STANDBY         │
                        └─────────────────────────┴─────────────────┴─────────────────────────┘
                    

To remove a replica, just run the shardmanctl shard rm command. Example below:

                        $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 shard --shard clover-1-n1 rm -n n4
                    

To remove the master, first run the shardmanctl shard switch command to switch the master to the replica, and then delete the old master.

                        $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 shard --shard clover-1-n1 switch --new-primary n4
                    

4.2.2. Rebalancing the data

The rebalance process starts automatically after adding nodes (by default if --no-rebalance is not set) or before deleting the node. And also can be started manually. The essence of the rebalancing process is to evenly distribute partitions for each sharded table between replication groups.

The rebalancing process for each sharded table iteratively determines the replication group with the maximum and minimum number of partitions and creates task to move one partition to the replication group with the minimum number of partitions. This process is repeated until max - min > 1. To move partitions we use logical replication. Partitions of co-located tables are moved together with partitions of the distributed tables to which they refer.

It is important to remember that max_logical_replication_workers should be rather high since the rebalance process uses up to max(max_replication_slots, max_logical_replication_workers, max_worker_processes, max_wal_senders)/3 concurrent threads. In practice, you can use max_logical_replication_workers = Repfactor + 3 * task_num (task_num - number of parallel rebalance tasks).

To rebalance sharded tables in the cluster0 cluster manually, run command (in this command etcd1, etcd2, etcd3 are etcd cluster nodes):

                    $ shardmanctl --store-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379 rebalance
                

If the process ends with an error, then you need to call the shardmanctl cleanup command with the --after-rebalance option.