In a distributed database managed by Shardman the following special table types are used: sharded tables and global tables.
Sharded tables are just usual PostgreSQL partitioned tables where a few partitions, making
up a shard, are regular local tables and the other partitions are
foreign tables available from remote servers via postgres_fdw.
Sharded tables are registered in the shardman.sharded_tables dictionary.
Use the CREATE TABLE
statement with the distributed_by parameter to create a sharded table.
Several sharded tables can be created as colocated. This means
that they have the same number of partitions and that their partitions
corresponding to the same sharding key should reside together.
During a rebalance, Shardman management utilities
ensure that corresponding partitions of colocated tables are moved to the same node.
(Such a rebalance happens, for example, when a new node is added to the cluster).
Colocation is necessary to ensure that joins of several tables are
propagated to the node where the actual data resides. To define
one sharded table colocated with another one, first, create one table
and then use the colocate_with parameter of the
CREATE TABLE statement while creating the second table.
Chains of colocated tables are not supported, all related
tables should be marked as colocated to one of the tables instead. Note that colocate_with
property is symmetric and transitive.
A sharded table consists of several partitions. Some of them are regular tables,
and others are foreign tables. By default, the number of partitions is determined
by the shardman.num_parts parameter, but it can be overwritten by
the num_parts CREATE TABLE parameter.
Most of DDL operations are restricted on partitions of a
sharded table. You should modify the parent table instead.
The number of partitions in a sharded table is defined when it is created and
cannot be changed afterwards. When new nodes are added to the cluster,
some partitions are moved from existing nodes to the new ones to balance the
load. So, to allow scaling of clusters, the initial number of partitions should
be high enough, but not too high since an extremely large number of partitions
significantly slows down query planning. For example, if you expect the number
of nodes in your cluster to grow by 4 times at a maximum, create sharded tables
with the number of partitions equal to 4 * N, where
N is the number of nodes. A cluster becomes unable to scale when the
number of cluster nodes reaches the number of partitions in the sharded table with
the minimal number of them.
Partitions of a sharded table can be partitioned by range. In this case, each
partition of a sharded table is a partitioned table consisting only of
regular or only of foreign subpartitions. All subpartitions
of a partition are located on the same node. Use the partition_by
CREATE TABLE parameter to specify a column that should
be used as a subpartition key column and the partition_bounds
parameter to set bounds of the second-level table partitions. New subpartitions can be added
or removed from a table as necessary. So you can omit the partition_bounds
parameter during table creation and create partitions later using the
shardman.create_subpart()
function. Other subpartition management functions allow you to drop, detach
or attach subpartitions of a sharded table. Subpartition management
is cluster-wide.
Global tables are available to all nodes of a cluster. Now a global table is a set of
regular tables synchronized by triggers. The main use case for a global table is to
store a relatively rarely updated set of data that is used by all cluster nodes.
When a sharded table is joined to a global table, joins between sharded table partitions and
the global table can be performed on nodes where individual partitions reside.
The implementation of trigger-based replication requires a non-deferrable primary key on a global
table to be defined. Currently when a global table is modified, an after-statement trigger
fires and propagates changes to other nodes of the cluster via foreign tables.
When new nodes are added to a cluster, global table data is transferred to
the new nodes via logical replication. When some nodes are removed from a cluster,
global tables get locked for writes for a brief time. Use the global
CREATE TABLE parameter to create a global table.
Global tables are registered in the shardman.global_tables dictionary.
Partitioned global tables are not supported.
Shardman extension allows creating several kinds of global objects. These are sharded and global tables, roles and tablespaces. The list of operations allowed on global objects is limited particularly to protect consistency of a global schema. For the same reason, most operations on global objects are cluster-wide. The list of cluster-wide operations includes:
CREATE for sharded and global tables, global roles and tablespaces or indexes
on sharded or global tables.
DROP for sharded and global tables, global roles and tablespaces or indexes
on sharded or global tables.
ALTER TABLE for sharded and global tables.
ALTER TABLESPACE for global tablespaces.
ALTER ROLE for global roles.
RENAME for sharded and global tables or indexes on them.
SET CONSTRAINTS ALL inside a transaction block.
These configuration settings control execution of the distributed DDL: shardman.broadcast_ddl and shardman.sync_schema. The first one can be used for a cluster-wide broadcast of all regular DDL operations (for example, creating schemas or functions). The second one controls broadcasting of statements related to global objects and should never be turned off without consulting the Postgres Pro Shardman support team.