4.11. Monitoring

4.11.1. Collecting Distributed Statement Statistics Using the pgpro_stats Extension

4.11.1. Collecting Distributed Statement Statistics Using the pgpro_stats Extension

During the execution of distributed queries, Shardman sends derived SQL queries to remote nodes that hold data partitions involved in the query execution. Let's call these SQL queries query fragments. Shardman sends such queries using the postgres_fdw extension. The node that queries the distributed table is named the coordinator, while the nodes that accept query fragments are named shards.

When the pgpro_stats extension is enabled on a Shardman cluster node, it collects statistics about local queries only. Information about distributed queries, initiated by this node, is incomplete because it misses data about remote query fragments. The statistics concerning queries initiated by other nodes is also ambiguous because there is no simple way for a user to determine the distributed query to which the fragment corresponds.

To address these issues, pgpro_stats for Shardman includes additional features.

When a node receives a query fragment, it saves its statistics into a separate shared hash table. This information is not returned when querying the pgpro_stats_statements view. Periodically and asynchronously, each node sends this information from the separate table to the coordinator corresponding to the query. The coordinator aggregates the statistical data obtained from query fragments with the statistics of its parent query, which is the query initiated by the client.

The pgpro_stats extension starts a separate background worker. This worker is responsible for sending the accumulated statistics to the coordinator nodes either every 5 seconds or when triggered by the guard latch. The collecting function sets this latch when the hash table is almost full.

To reduce network traffic initiated by statistics sender there is a compression applied to statistics data sent. Compression method can be selected by the GUC pgpro_stats.transport_compression.

Each node stores the integral number of statistics entries received from the shard node and the timestamp of the last receive. When a coordinator node receives a statistics message, it updates these values, which are accessible using SQL interface.

There are additional pgpro_stats SQL functions introduced by Shardman additions described in Section 8.2 and configuration parameters described in the section called “pgpro_stats parameters”.