The pgpro_temp_stats extension collects extended statistics on multiple columns of temporary tables. It operates automatically during execution of queries that reference temporary tables.
The pgpro_temp_stats extension is provided
with Postgres Pro Enterprise as a separate pre-built
package pgpro-temp-stats-ent-17
(for the detailed installation instructions, see
Chapter 17).
To enable pgpro_temp_stats, complete the following steps:
Add the library name to the
shared_preload_libraries variable in the
postgresql.conf file.
shared_preload_libraries = 'pgpro_temp_stats'
Reload the database server for the changes to take effect.
To verify that pgpro_temp_stats is installed correctly, you can run the following command:
SHOW shared_preload_libraries;
Create the pgpro_temp_stats extension using the following query:
CREATE EXTENSION pgpro_temp_stats;
Enable the extension by setting the pgpro_temp_stats.enable
parameter to on.
SET pgpro_temp_stats.enable = 'on';
The pgpro_temp_stats extension automatically searches for queries that reference temporary tables and collects the required statistical data. This data allows the planner to predict row cardinality for such queries more accurately and reduce query execution time.
pgpro_temp_stats performs the following actions for each temporary table in a query:
Executes the CREATE STATISTICS command for columns referenced in the query.
This command creates an extended statistics object with a name in
the following format:
stN_tablename.
Information about this created statistics object is added to the
pg_statistic_ext
system catalog as for all other statistics objects.
This step is performed only if the
pgpro_temp_stats.create_statistics parameter
is set to on (default), and the query references
two or more columns of a temporary table. You can set this parameter
to off to skip this step for all queries.
Executes the ANALYZE command for the same columns referenced in the query.
Collected statistical data is added to the pg_statistic system catalog as for manual command runs.
For instance, for a sample query referencing two columns of one temporary table, pgpro_temp_stats operation looks as follows:
SELECT * FROM t1 WHERE (a = 1) AND (b = 0); DEBUG: CREATE STATISTICS st0_t1 ON a, b FROM t1; DEBUG: ANALYZE t1 (a, b)
pgpro_temp_stats logs messages about its operation at the DEBUG1 severity level.
If the pgpro_temp_stats.skip_analyze parameter is set to
on (default), all full-table ANALYZE
commands on temporary tables that run automatically or manually are ignored
while the pgpro_temp_stats extension is enabled.
You can set this parameter to off to execute such
commands in addition to pgpro_temp_stats
operations.
pgpro_temp_stats can collect statistics for queries that join two or more temporary tables or join temporary tables with permanent tables. In these cases, pgpro_temp_stats collects statistics on columns of temporary tables that are used in join conditions.
SELECT * FROM t2 JOIN t3 ON t2.id = t3.id AND t2.val = t3.val; DEBUG: CREATE STATISTICS st0_t2 ON id, val from t2; DEBUG: ANALYZE t2 (id, val) DEBUG: CREATE STATISTICS st0_t3 ON id, val from t3; DEBUG: ANALYZE t3 (id, val)
pgpro_temp_stats can also collect statistics
for queries with WHERE clauses that use expressions
or functions referencing columns of temporary tables. In these cases,
pgpro_temp_stats can execute the
CREATE STATISTICS command both on separate columns
and on entire expressions and functions. The ANALYZE
command is executed on columns used in expressions or functions.
SELECT * FROM t4 WHERE a > 10 AND date_trunc('week', b) = date '2023-05-01';
DEBUG: CREATE STATISTICS st0_t4 ON a, date_trunc('week', b) from t4;
DEBUG: ANALYZE t4 (a, b)
pgpro_temp_stats internally stores IDs of queries for which statistical data was collected. This allows pgpro_temp_stats to avoid collecting statistics multiple times and reuse statistical data.
The pgpro_temp_stats.ts_max_items parameter specifies the maximum number of stored query IDs. If this limit is reached, pgpro_temp_stats removes IDs from its internal storage using the Least Recently Used (LRU) algorithm.
You can also use the pgpro_temp_stats_reset function to clear all query IDs in this internal storage.
This example demonstrates how the pgpro_temp_stats extension works.
-- The pgpro_temp_stats extension is disabled
-- Create a sample temporary table
CREATE TEMP TABLE t1 (
a int,
b int
);
-- Execute ANALYZE for the entire table
ANALYZE t1;
-- Add data to the created table
INSERT INTO t1 SELECT i/100, i/500 FROM generate_series(1, 10000) s(i);
-- Execute a query that references two columns of the created temporary table
-- The estimated number of rows is significantly lower than the actual number of rows
EXPLAIN ANALYZE
SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
QUERY PLAN
------------------------------------------------------------------------------------------------
Seq Scan on t1 (cost=0.00..198.22 rows=1 width=8) (actual time=0.059..3.560 rows=100 loops=1)
Filter: ((a = 1) AND (b = 0))
Rows Removed by Filter: 9900
Buffers: local hit=45
Planning:
Buffers: shared hit=24
Planning Time: 0.313 ms
Execution Time: 3.594 ms
(8 rows)
-- Enable the pgpro_temp_stats extension
SET pgpro_temp_stats.enable = true;
-- Configure messages to see information about creating statistics and analyzing
SET client_min_messages = 'debug1';
-- Execute the same query. For columns referenced by this query,
-- pgpro_temp_stats automatically creates an extended statistics object and runs ANALYZE
-- As a result, the number of rows is estimated more accurate
EXPLAIN ANALYZE
SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
DEBUG: CREATE STATISTICS st0_t1 ON a, b FROM t1;
DEBUG: ANALYZE t1 (a, b)
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on t1 (cost=0.00..195.00 rows=100 width=8) (actual time=0.090..2.981 rows=100 loops=1)
Filter: ((a = 1) AND (b = 0))
Rows Removed by Filter: 9900
Buffers: local hit=45
Planning:
Buffers: shared hit=143 dirtied=6 written=6, local hit=45
Planning Time: 12.958 ms
Execution Time: 3.009 ms
(8 rows)
pgpro_temp_stats.enable (boolean)
#
Enables or disables the pgpro_temp_stats
extension. The default value is off. Only
superusers can change this setting.
pgpro_temp_stats.create_statistics (boolean)
#
Specifies whether to run the
CREATE STATISTICS
command automatically for columns of temporary tables that are
referenced by queries. The default value is on.
Only superusers can change this setting.
pgpro_temp_stats.skip_analyze (boolean)
#
Specifies whether to ignore full-table ANALYZE
commands on temporary tables that run automatically or manually
while the pgpro_temp_stats extension is
enabled. The default value is on. Only superusers can
change this setting.
pgpro_temp_stats.ts_max_items (integer)
#
Specifies the maximum number of stored IDs of queries referencing
temporary tables for which statistical data was collected. The default
value is 4096.