G.9. pgpro_temp_stats — collect multicolumn statistics on temporary tables #

G.9.1. Description
G.9.2. Installation
G.9.3. Usage
G.9.4. Example
G.9.5. Configuration Parameters
G.9.6. Functions

G.9.1. Description #

The pgpro_temp_stats extension collects extended statistics on multiple columns of temporary tables. It operates automatically during execution of queries that reference temporary tables.

G.9.2. Installation #

The pgpro_temp_stats extension is provided with Postgres Pro Enterprise as a separate pre-built package pgpro-temp-stats-ent-17 (for the detailed installation instructions, see Chapter 17).

To enable pgpro_temp_stats, complete the following steps:

  1. Add the library name to the shared_preload_libraries variable in the postgresql.conf file.

    shared_preload_libraries = 'pgpro_temp_stats'
    
  2. Reload the database server for the changes to take effect.

    To verify that pgpro_temp_stats is installed correctly, you can run the following command:

    SHOW shared_preload_libraries;
    
  3. Create the pgpro_temp_stats extension using the following query:

    CREATE EXTENSION pgpro_temp_stats;
    
  4. Enable the extension by setting the pgpro_temp_stats.enable parameter to on.

    SET pgpro_temp_stats.enable = 'on';
    

G.9.3. Usage #

The pgpro_temp_stats extension automatically searches for queries that reference temporary tables and collects the required statistical data. This data allows the planner to predict row cardinality for such queries more accurately and reduce query execution time.

G.9.3.1. Basic Usage #

pgpro_temp_stats performs the following actions for each temporary table in a query:

  1. Executes the CREATE STATISTICS command for columns referenced in the query.

    This command creates an extended statistics object with a name in the following format: stN_tablename. Information about this created statistics object is added to the pg_statistic_ext system catalog as for all other statistics objects.

    This step is performed only if the pgpro_temp_stats.create_statistics parameter is set to on (default), and the query references two or more columns of a temporary table. You can set this parameter to off to skip this step for all queries.

  2. Executes the ANALYZE command for the same columns referenced in the query.

    Collected statistical data is added to the pg_statistic system catalog as for manual command runs.

For instance, for a sample query referencing two columns of one temporary table, pgpro_temp_stats operation looks as follows:

SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
DEBUG:  CREATE STATISTICS st0_t1 ON a, b FROM t1;
DEBUG:  ANALYZE t1 (a, b)

pgpro_temp_stats logs messages about its operation at the DEBUG1 severity level.

If the pgpro_temp_stats.skip_analyze parameter is set to on (default), all full-table ANALYZE commands on temporary tables that run automatically or manually are ignored while the pgpro_temp_stats extension is enabled. You can set this parameter to off to execute such commands in addition to pgpro_temp_stats operations.

G.9.3.2. Usage for Joined Tables #

pgpro_temp_stats can collect statistics for queries that join two or more temporary tables or join temporary tables with permanent tables. In these cases, pgpro_temp_stats collects statistics on columns of temporary tables that are used in join conditions.

SELECT * FROM t2 JOIN t3 ON t2.id = t3.id AND t2.val = t3.val;
DEBUG:  CREATE STATISTICS st0_t2 ON id, val from t2;
DEBUG:  ANALYZE t2 (id, val)
DEBUG:  CREATE STATISTICS st0_t3 ON id, val from t3;
DEBUG:  ANALYZE t3 (id, val)

G.9.3.3. Usage for Expressions and Functions #

pgpro_temp_stats can also collect statistics for queries with WHERE clauses that use expressions or functions referencing columns of temporary tables. In these cases, pgpro_temp_stats can execute the CREATE STATISTICS command both on separate columns and on entire expressions and functions. The ANALYZE command is executed on columns used in expressions or functions.

SELECT * FROM t4 WHERE a > 10 AND date_trunc('week', b) = date '2023-05-01';
DEBUG:  CREATE STATISTICS st0_t4 ON a, date_trunc('week', b) from t4;
DEBUG:  ANALYZE t4 (a, b)

G.9.3.4. Storing Processed Queries #

pgpro_temp_stats internally stores IDs of queries for which statistical data was collected. This allows pgpro_temp_stats to avoid collecting statistics multiple times and reuse statistical data.

The pgpro_temp_stats.ts_max_items parameter specifies the maximum number of stored query IDs. If this limit is reached, pgpro_temp_stats removes IDs from its internal storage using the Least Recently Used (LRU) algorithm.

You can also use the pgpro_temp_stats_reset function to clear all query IDs in this internal storage.

G.9.4. Example #

This example demonstrates how the pgpro_temp_stats extension works.

-- The pgpro_temp_stats extension is disabled

-- Create a sample temporary table
CREATE TEMP TABLE t1 (
    a int,
    b int
);

-- Execute ANALYZE for the entire table
ANALYZE t1;

-- Add data to the created table
INSERT INTO t1 SELECT i/100, i/500 FROM generate_series(1, 10000) s(i);

-- Execute a query that references two columns of the created temporary table
-- The estimated number of rows is significantly lower than the actual number of rows
EXPLAIN ANALYZE
SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
                                        QUERY PLAN
------------------------------------------------------------------------------------------------
Seq Scan on t1  (cost=0.00..198.22 rows=1 width=8) (actual time=0.059..3.560 rows=100 loops=1)
Filter: ((a = 1) AND (b = 0))
Rows Removed by Filter: 9900
Buffers: local hit=45
Planning:
Buffers: shared hit=24
Planning Time: 0.313 ms
Execution Time: 3.594 ms
(8 rows)

-- Enable the pgpro_temp_stats extension
SET pgpro_temp_stats.enable = true;

-- Configure messages to see information about creating statistics and analyzing
SET client_min_messages = 'debug1';

-- Execute the same query. For columns referenced by this query,
-- pgpro_temp_stats automatically creates an extended statistics object and runs ANALYZE
-- As a result, the number of rows is estimated more accurate
EXPLAIN ANALYZE
SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
DEBUG:  CREATE STATISTICS st0_t1 ON a, b FROM t1;
DEBUG:  ANALYZE t1 (a, b)
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on t1  (cost=0.00..195.00 rows=100 width=8) (actual time=0.090..2.981 rows=100 loops=1)
Filter: ((a = 1) AND (b = 0))
Rows Removed by Filter: 9900
Buffers: local hit=45
Planning:
Buffers: shared hit=143 dirtied=6 written=6, local hit=45
Planning Time: 12.958 ms
Execution Time: 3.009 ms
(8 rows)

G.9.5. Configuration Parameters #

pgpro_temp_stats.enable (boolean) #

Enables or disables the pgpro_temp_stats extension. The default value is off. Only superusers can change this setting.

pgpro_temp_stats.create_statistics (boolean) #

Specifies whether to run the CREATE STATISTICS command automatically for columns of temporary tables that are referenced by queries. The default value is on. Only superusers can change this setting.

pgpro_temp_stats.skip_analyze (boolean) #

Specifies whether to ignore full-table ANALYZE commands on temporary tables that run automatically or manually while the pgpro_temp_stats extension is enabled. The default value is on. Only superusers can change this setting.

pgpro_temp_stats.ts_max_items (integer) #

Specifies the maximum number of stored IDs of queries referencing temporary tables for which statistical data was collected. The default value is 4096.

G.9.6. Functions #

pgpro_temp_stats_reset() returns bigint #

Removes all stored IDs of queries referencing temporary tables for which statistical data was collected. Returns the number of removed entries.