Table of Contents
Query performance can be affected by many things. Some of these can be controlled by the user, while others are fundamental to the underlying design of the system. This chapter provides some hints about understanding and tuning Postgres Pro Shardman performance.
Performance tuning should be done during application development and include
an accurate choice of hardware (for example, estimating the number of CPUs and memory per
Postgres Pro Shardman cluster node or tuning your storage), OS tuning
(for example, tuning the swappiness parameter or network-related behavior) and DBMS tuning
(choosing efficient configuration). But first of all, an application should be
tested and tuned for distributed DBMS. This includes designing a distributed
schema (or converting an existing schema to a distributed one), tuning queries,
using connection poolers, caching and even checking performance issues
related to possible serialization errors or Postgres Pro Shardman node
outage. The design of the schema should include accurate selection of a sharding key
and a decision which tables should become global. Usually you select a sharding
key so that:
Most of the queries filter out most of sharded table partitions.
Sharded tables are colocated and all joins of sharded tables are equi-joins on the sharding key.
These rules allow Postgres Pro Shardman to efficiently exclude unused shards from queries and to push down joins to shards where the required data resides.
Each Postgres Pro Shardman node operates as a usual DBMS server,
so all standard recommendations for tuning PostgreSQL
for production load remain in place. You should select shared_buffers,
work_mem, effective_cache_size depending
on resources available to DBMS. Keep in mind that if the cluster topology is
set to cross, Repfactor
instances run on a single node w. When all cluster nodes are online, replicas
should not utilize a lot of CPUs. However, in case of node failure, masters for
Repfactor replication groups can become running on one server,
which can create significant load on it. While tuning the
max_connections parameter, note that each transaction can
initiate n-1 connections, where n is the
number of replication groups in the cluster. When Silk
is enabled, it is still true for transactions containing DML operations. When
Silk is disabled, it is also true for read-only transactions.
Other parameters, which you perhaps would like to tune, are foreign server
options. They can be set in FDWOptions section of
Postgres Pro Shardman configuration file.
Parameters that significantly affect Postgres Pro Shardman
performance are fetch_size, batch_size
and async_capable. When Silk
transport is not enabled, fetch_size determines the number
of records that are fetched from a remote server at once. When Silk
transport is enabled, fetch_size currently does not have
significant impact on the query execution. batch_size
specifies how many rows can be combined in a single remote INSERT
operation for a sharded table. async_capable allows asynchronous execution
and should always be turned on (which is the default).
The shardman.gt_batch_size configuration parameter
allows you to optimize the size of an intermediate buffer for
INSERT and DELETE operations on global
tables.