Shardman is a PostgreSQL-based distributed database management system (DBMS) that implements sharding. Sharding is a database design principle where rows of a table are held separately in different databases that are potentially managed by different DBMS instances. The main purpose of Shardman is to make querying sharded distributed databases efficient and ease the complexity of managing them.
Shardman is composed of several software components:
PostgreSQL 14 DBMS with a set of patches.
Shardman extension.
Management tools and services, including built-in stolon manager to provide high availability.
The working volume of data does not fit in the RAM of one server, but several shards can fit (or at least reading is parallelized).
Number of sessions is too large for one instance of PostgreSQL.
Intensive writing to WAL takes place.
Complex logic consuming too much CPU, and one server is not enough.
If the memory, session, CPU load can be pulled by a single PostgreSQL server, this will be both faster and simpler. (This applies to testing too!)
A minimum of three nodes are required to deploy Shardman. One node is required for an etcd cluster (single-node etcd cluster), and a minimum of two nodes is required for the RDBMS cluster. It is possible to reduce the minimum deployment to two nodes by placing etcd on one of the RDBMS cluster nodes. The minimal deployment is described in section Get Started with Shardman.
Yes, Shardman is fault-tolerant at the level of each shard. Each shard is a fault-tolerant cluster.
In Shardman, tables are divided into partitions, and the partitions are distributed between shards.
No, the number of partitions of sharded tables is set when creating them and remains unchanged. If you expect that the amount of data you have will grow significantly, you should create the necessary number of partitions (by default - 20) in advance.
No, Shardman currently does not support automatic change of a sharding key. In order to change the sharding key, you need to create new tables with a new sharding key and migrate data from old tables to new ones.
No, Shardman currently does not support this feature.
Minimally a Shardman cluster can consist of a single node without fault tolerance, but such a configuration makes little sense. You can add or remove shards, Shardman will automatically (by default, this is adjustable) redistribute data between nodes. Replicas can be added to Shardman, then shards will be fault-tolerant.
When adding new shards, data will be redistributed between all shards, including new ones.
Shardman can be accessed through any node in the cluster, all nodes in the cluster are equal. Use the shardmanctl getconnstr command to get the cluster connection string.
There is no built-in balancing solution at the moment.
But you can organize balancing at the application level, for example,
see JDBC driver options (loadBalanceHosts).
For libpq, this functionality will be implemented in PostgreSQL 16 release.