The pgpro_planner extension provides query transformations and optimizations that allow producing efficient query execution plans and improving performance.
These optimizations are the following:
simplification
of x + 0, x - 0, x * 1,
x / 1, and x * 0 operations to
x
caching results
of correlated subqueries using the Memoize nodes
By default, the extension is disabled to avoid implicit or unexpected query plan changes. You can enable it as described below. Each transformation can also be enabled or disabled separately using the corresponding configuration parameters to fit specific user needs.
The pgpro_planner extension is provided with
Postgres Pro Enterprise as a separate pre-built package
pgpro-planner-ent-18
(for the detailed installation instructions, see Chapter 17).
To enable pgpro_planner, do the following:
Load the extension in one of the following ways:
globally for all sessions by adding the library name to the
shared_preload_libraries variable in the
postgresql.conf file
shared_preload_libraries = 'pgpro_planner'
in the current session using the LOAD
command without adding the library name to shared_preload_libraries
LOAD 'pgpro_planner';
Reload the database server for the changes to take effect.
To verify that the pgpro_planner library is installed correctly, you can run the following command:
SHOW shared_preload_libraries;
Enable the extension by setting the pgpro_planner.enable
parameter to on.
SET pgpro_planner.enable = 'on';
Even for trivial arithmetic operations, such as x + 0,
the standard planner predicts selectivity and cardinality as it does for
expressions and cannot use indexes in query execution plans.
The pgpro_planner extension can simplify and rewrite the following basic arithmetic operations:
x + 0 → x
x - 0 → x
x * 1 → x
x / 1 → x
x * 0 → x (only if x
is not NULL)
This simplification allows the planner not only to avoid unnecessary calculations, but also to reduce estimation errors, produce plans with indexes and effective access methods, and thus significantly improve performance.
You can disable this simplification by setting the
pgpro_planner.enable_simplify_trivials
parameter to off, for instance, for debugging purposes.
This example demonstrates how pgpro_planner arithmetic simplification affects execution plans for the following query:
SELECT * FROM t WHERE x + 0 > 900;
For the initial arithmetic operation, the planner produces the query
execution plan with Seq Scan and does not use an index.
Seq Scan on t (cost=0.00..20.00 rows=333 width=4) (actual time=0.500..0.571 rows=100.00 loops=1) Filter: ((x + 0) > 900) Rows Removed by Filter: 900 Buffers: shared hit=5 Planning: Buffers: shared hit=14 Planning Time: 0.432 ms Execution Time: 0.627 ms (8 rows)
When pgpro_planner simplifies this operation,
the planner creates the plan with Index Only Scan
and performs value comparisons in Index Cond, which
helps to reduce query execution time.
Index Only Scan using idx_t_x on t (cost=0.28..10.03 rows=100 width=4) (actual time=0.077..0.247 rows=100.00 loops=1) Index Cond: (x > 900) Heap Fetches: 100 Index Searches: 1 Buffers: shared hit=4 Planning Time: 0.217 ms Execution Time: 0.319 ms (7 rows)
pgpro_planner has the following limitations for arithmetic simplification:
Only the following operations are supported: x + 0,
x - 0, x * 1,
x / 1, and x * 0 (if
x is not NULL).
Only the following data types for constants are supported:
int2, int4, float, and
numeric.
Operations that use complex expressions to get
0 or 1, such as
x - 100 - 100, cannot be simplified.
Operations that use volatile functions cannot be simplified.
Correlated subqueries use values from their outer queries, for instance,
via the WHERE clause. These subqueries are executed once
for each row of the outer query, so they can have major impact on performance.
The pgpro_planner extension can cache results of
parameterized correlated subqueries by adding the Memoize
nodes to query execution plans. This allows skipping identical scans for
the same parameters and improving performance.
This functionality is enabled if the global
enable_memoize configuration parameter and the
pgpro_planner.memoize_subplan parameter of
pgpro_planner are set to on
(default).
pgpro_planner processes a query during the plan creation stage. It performs the following steps:
pgpro_planner searches for supported correlated subqueries.
If the pgpro_planner.memoize_check_uniqueness
parameter is set to on (default),
pgpro_planner checks uniqueness for values
of columns referenced in subqueries based on statistical information.
This helps to determine whether the Memoize node
can be useful.
The Memoize node can be added in the following cases:
A unique index on a column does not exist, and column values are not unique according to statistical information.
A unique index on a column exists but a column contains
NULL values according to statistical information.
A multicolumn unique index exists but values of the referenced column are not unique according to statistical information.
Statistical information for a column is not gathered yet regardless of whether an index exists.
pgpro_planner estimates the expected cost
of the resulting plan with the Memoize
node.
If the estimated cost is lower than the cost of the initial plan,
pgpro_planner adds the
Memoize node into the head of the corresponding
subplan.
You can disable caching of correlated subqueries by setting the
pgpro_planner.memoize_subplan
parameter to off, for instance, if you use correlated
subqueries rarely. To disable uniqueness check for debugging purposes, set
the pgpro_planner.memoize_check_uniqueness parameter to
off.
This example demonstrates how caching of correlated subqueries works in different cases.
Create two sample tables named t and
t1 and a unique index on the
x column of the first table.
CREATE TABLE t (x int, y int); INSERT INTO t SELECT id, id%20 FROM generate_series(1,1000) id; CREATE UNIQUE INDEX t_x_idx ON t (x); CREATE TABLE t1 (x1 int); INSERT INTO t1 SELECT id%2 FROM generate_series(1,1000) id;
Run the following query, where the subquery uses values of the
correlated column named y. There is no unique
index on this column, so the Memoize node is added to
the query execution plan.
EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.y LIMIT 1);
QUERY PLAN
------------------------------------------------
Seq Scan on t
Filter: (SubPlan 1)
SubPlan 1
-> Memoize
Cache Key: t.y
Cache Mode: binary
-> Limit
-> Aggregate
-> Seq Scan on t1
Filter: (x1 = t.y)
(10 rows)
Run another query, where the subquery uses values of the correlated
column named x. The unique index on this
column exists but statistical information is not gathered yet, so the
Memoize node is generated.
EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.x LIMIT 1);
QUERY PLAN
------------------------------------------------
Seq Scan on t
Filter: (SubPlan 1)
SubPlan 1
-> Memoize
Cache Key: t.x
Cache Mode: binary
-> Limit
-> Aggregate
-> Seq Scan on t1
Filter: (x1 = t.x)
(10 rows)
Execute the same query once again. Statistical information for the
x column is now gathered, and the unique index
still exists, so the Memoize node is not added to the
plan.
EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.x LIMIT 1);
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on t
Filter: (x = (SubPlan 1))
SubPlan 1
-> Limit
-> Aggregate
-> Seq Scan on t1
Filter: (x1 = t.x)
(7 rows)
Remove the previously created index and create a new multicolumn index
on both columns named x and y.
CREATE UNIQUE INDEX on t(x,y); DROP INDEX t_x_idx;
Run the query referencing the correlated y
column. Although the multicolumn unique index now exists, values of this
column are not unique according to statistical information. The
Memoize node is added to the resulting plan.
EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.y LIMIT 1);
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on t
Filter: (((x = (SubPlan 1).col1) AND (y = (SubPlan 1).col2)))
SubPlan 1
-> Memoize
Cache Key: t.y
Cache Mode: binary
-> Limit
-> Aggregate
-> Seq Scan on t1
Filter: (x1 = t.y)
(10 rows)
Run the query referencing the correlated x column.
For this case, the Memoize node is not still generated,
since values of this column are unique according to statistical information.
EXPLAIN (COSTS OFF)
SELECT x FROM t WHERE (t.x,t.y) = (SELECT sum(x1),1 FROM t1 WHERE x1 = t.x LIMIT 1);
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on t
Filter: (((x = (SubPlan 1).col1) AND (y = (SubPlan 1).col2)))
SubPlan 1
-> Limit
-> Aggregate
-> Seq Scan on t1
Filter: (x1 = t.x)
(7 rows)
pgpro_planner cannot cache results of a correlated subquery in the following cases:
A subquery uses volatile functions.
A subquery uses non-standard operators.
A subquery uses operands with different types without compatible hash operator classes.
A subquery uses the min or max
aggregate functions.
A query plan tree does not contain the corresponding subplan node.
The estimated cost of the target query plan with the
Memoize nodes is greater than the cost of the
initial plan.
pgpro_planner.enable (boolean)
#
Enables or disables the pgpro_planner
extension. The default value is off to avoid
implicit or unexpected query plan changes.
pgpro_planner.enable_simplify_trivials (boolean)
#
Enables or disables
simplification
of trivial arithmetic operations. The default value is
on.
pgpro_planner.memoize_subplan (boolean)
#
Enables or disables
caching of
correlated subqueries. The default value is on.
pgpro_planner.memoize_check_uniqueness (boolean)
#
Enables or disables the smart logic to check uniqueness of
column values for caching
of correlated subqueries. The default value is
on.
pgpro_planner.force_memoize_subplan (boolean)
#
Enables or disables automatic addition of the Memoize
node for all correlated subqueries ignoring the plan cost model
and uniqueness check. This parameter is designed only for debugging
purposes, do not use it in production. The default value is
off.