8.3. DML optimizations

While evaluating performance of DML statements, it's important to understand how they are processed in Shardman.

First of all, execution of INSERT significantly differs from execution of UPDATE and DELETE statements. The behavior of INSERT is controlled by batch_size foreign server option, which can be set in FDWOptions section of Shardman configuration file . If batch_size is more than 0, INSERTing in the same statement several values which falls into the same foreign partition are grouped together in batches of specified size. Remote INSERT statements are prepared with the necessary number of parameters and then are executed with the given values. If number of values doesn't match the number of prepared arguments, the modified statement with the necessary number of parameters is prepared again. This optimization can fail if transaction inserts records one-by-one or records routed to different foreign tables are intermixed in one INSERT statement. A batch is formed for a single foreign modify operation. It is sent to the remote server when the batch is filled or when the modify operation is over. The modify operation is over when we start routing tuples to another sharded table partition. So, for bulk load inserting multiple values in a single INSERT command or using COPY is recommended (as COPY is optimized in a similar way). Large batch_size values allow to issue less INSERT statements on remote side and so significantly reduce communication costs. However, during construction of parameters for prepared INSERT statements all inserted values should be copied to libpq-allocated memory. This can lead to unrestricted memory usage on the side of a query coordinator when several large text or bytea objects are loaded.

UPDATE and DELETE statements can be executed in direct or indirect mode. A direct mode is used when statement can be directly sent to a foreign server. In this mode on the base of original ModifyTable plan node, a new statement is created to modify a table on the remote server. Using direct update is not always possible. In particular, it's impossible when some conditions should be evaluated locally. In this case much less efficient indirect modification is used. An indirect modification includes several statements - the first one is SELECT FOR UPDATE to lock remote rows. The second is a real UPDATE or DELETE which is executed for each row of the first statement result after local filters are applied to the result. Evidently, direct modifications are much more efficient.

You can easily identify if DML statement is going to be executed in direct or indirect mode looking at the query plan. A typical example of indirect modification is:

EXPLAIN VERBOSE DELETE FROM pgbench_history
WHERE bid = 20 AND mtime > '2023-03-14 10:00:00'::timestamptz;
                                                  QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
 Delete on public.pgbench_history  (cost=100.00..142.66 rows=0 width=0)
   Foreign Delete on public.pgbench_history_17_fdw pgbench_history_1
     Remote SQL: DELETE FROM public.pgbench_history_17 WHERE ctid = $1
   ->  Foreign Scan on public.pgbench_history_17_fdw pgbench_history_1  (cost=100.00..142.66 rows=4 width=10)
         Output: pgbench_history_1.tableoid, pgbench_history_1.ctid
         Filter: (pgbench_history_1.mtime > '2023-03-14 10:00:00+03'::timestamp with time zone)
         Remote SQL: SELECT mtime, ctid FROM public.pgbench_history_17 WHERE ((bid = 20)) FOR UPDATE

If we had chosen another type for string constant, this would become direct update.

EXPLAIN VERBOSE DELETE FROM pgbench_history
WHERE bid = 20 AND mtime > '2023-03-14 10:00:00'::timestamp;
explain verbose delete from pgbench_history where bid = 20 and mtime > '2023-03-14 10:00:00'::timestamp;
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on public.pgbench_history  (cost=100.00..146.97 rows=0 width=0)
   Foreign Delete on public.pgbench_history_17_fdw pgbench_history_1
   ->  Foreign Delete on public.pgbench_history_17_fdw pgbench_history_1  (cost=100.00..146.97 rows=4 width=10)
         Remote SQL: DELETE FROM public.pgbench_history_17 WHERE ((mtime > '2023-03-14 10:00:00'::timestamp without time zone)) AND ((bid = 20))

In direct update mode we see that only one statement is executed on the remote server.