In a distributed database managed by Postgres Pro Shardman the following special table types are used: sharded tables and global tables.
Sharded tables are just usual PostgreSQL partitioned tables where a few partitions, making
up a shard, are regular local tables and the other partitions are
foreign tables available from remote servers.
Sharded tables are registered in the shardman.sharded_tables dictionary.
Use the CREATE TABLE
statement with the distributed_by parameter to create a sharded table.
Several sharded tables can be created as colocated. This means
that they have the same number of partitions and that their partitions
corresponding to the same sharding key should reside together.
During a rebalance, Postgres Pro Shardman management utilities
ensure that corresponding partitions of colocated tables are moved to the same node.
(Such a rebalance happens, for example, when a new node is added to the cluster).
Colocation is necessary to ensure that joins of several tables are
propagated to the node where the actual data resides. To define
one sharded table colocated with another one, first, create one table
and then use the colocate_with parameter of the
CREATE TABLE statement while creating the second table.
Chains of colocated tables are not supported, all related
tables should be marked as colocated to one of the tables instead. Note that colocate_with
property is symmetric and transitive.
Mind some limitations for distributed systems. Temporary tables cannot participate in distributed transactions. Regular Postgres Pro Shardman can be joined with sharded tables on local nodes, yet they cannot have common keys or indexes with them.
A sharded table consists of several partitions. Some of them are regular tables,
and others are foreign tables. By default, the number of partitions is determined
by the shardman.num_parts parameter, but it can be overwritten by
the num_parts CREATE TABLE parameter.
Most of DDL operations are restricted on partitions of a
sharded table. You should modify the parent table instead.
The number of partitions in a sharded table is defined when it is created and
cannot be changed afterwards. When new nodes are added to the cluster,
some partitions are moved from existing nodes to the new ones to balance the
load. So, to allow scaling of clusters, the initial number of partitions should
be high enough, but not too high since an extremely large number of partitions
significantly slows down query planning. For example, if you expect the number
of nodes in your cluster to grow by 4 times at a maximum, create sharded tables
with the number of partitions equal to 4 * N, where
N is the number of nodes. A cluster becomes unable to scale when the
number of cluster nodes reaches the number of partitions in the sharded table with
the minimal number of them.
Global tables are available to all nodes of a cluster. Now a global table is a set of
regular tables synchronized by triggers. The main use case for a global table is to
store a relatively rarely updated set of data that is used by all cluster nodes.
When a sharded table is joined to a global table, joins between sharded table partitions and
the global table can be performed on nodes where individual partitions reside.
The implementation of trigger-based replication requires a non-deferrable primary key on a global
table to be defined. Currently when a global table is modified, an after-statement trigger
fires and propagates changes to other nodes of the cluster via foreign tables.
When new nodes are added to a cluster, global table data is transferred to
the new nodes via logical replication. When some nodes are removed from a cluster,
global tables get locked for writes for a brief time. Use the global
CREATE TABLE parameter to create a global table.
Global tables are registered in the shardman.global_tables dictionary.
Partitioned global tables are not supported.
Postgres Pro Shardman extension allows creating several kinds of global objects. These are sharded and global tables, roles, sequences and tablespaces. The list of operations allowed on global objects is limited particularly to protect consistency of a global schema. For the same reason, most operations on global objects are cluster-wide. The list of cluster-wide operations includes:
CREATE for sharded and global tables, global roles and tablespaces or indexes
on sharded or global tables.
DROP for sharded and global tables, global roles and tablespaces or indexes
on sharded or global tables.
ALTER TABLE for sharded and global tables.
ALTER TABLESPACE for global tablespaces.
ALTER ROLE for global roles.
RENAME for sharded and global tables or indexes on them.
SET CONSTRAINTS ALL inside a transaction block.
These configuration settings control execution of the distributed DDL: shardman.broadcast_ddl and shardman.sync_schema. The first one can be used for a cluster-wide broadcast of all regular DDL operations (for example, creating schemas or functions). The second one controls broadcasting of statements related to global objects and should never be turned off without consulting the Postgres Pro Shardman support team.
Distributed tables have some specifics. Such table is available on all cluster nodes. As partitioned table, it has a number of partitions, with this number being a constant that cannot be changed without a restart. Each shard usually stores several partitions of this shard, while other partitions are stored elsewhere. Mind that these partitions are interconnected.
Postgres Pro Shardman colocated table
is a partitioned table that consists of a number of partitions
that are colocated with the partitions of a distributed
table for the quick access. In this case, partitioning is
usually made with partition, sharding, or foreign key for efficiency purposes.
These are recommended solutions for larger JOIN.