shardman-loader — Shardman bulk-load utility
shardman-loader [options] coordinator_connstr table_name
Here options are:
[--num-conn=number_of_connections] [--rows-per-xact=rows_per_xact] [--no-twophase] [--delimiter=delimiter] [--quote=quote] [--escape=escape] [--print-progress] [--report-every-rows=number_of_rows] [--file-path=path_to_file] [--format= text | csv ] [--help]
shardman-loader is a
Shardman bulk-load utility. Use it
for fast loading of text and CSV
data into a Shardman cluster.
Internally, shardman-loader connects to the coordinator node
using the provided connection string and retrieves information about all the other available
nodes in the cluster. After that, shardman-loader opens
multiple connections limited in number by --num-conn.
By default, exactly one connection
per replication group is opened. Although it ensures sufficient
performance of data load in many cases, it is recommended that you try setting
--num-conn to twice the number of nodes in the cluster
to achieve more parallelism. Inside each opened connection,
shardman-loader utilizes an optimized
COPY FROM
command to improve the speed of routing data between nodes.
For atomicity of data loading, shardman-loader uses the two-phase commit (2PC) protocol.
shardman-loader accepts the following command-line options:
-c number_of_connections--num-conn=number_of_connectionsSpecifies the number of connections to use for data loading. The default is the number of replication groups.
-r rows_per_xact--rows-per-xact=rows_per_xact
Sets shardman-loader to commit
the current transaction and start a new one for each
rows_per_xact rows. By
default, all rows in each connection are committed in a single
transaction.
--no-twopahseTurns off the use of the 2PC protocol
-d delimiter--delimiter=delimiter
Specifies the character that separates columns within each row
(line) of the file. The default is a tab character in text format and
a comma in CSV format.
This must be a single one-byte character.
-q quote--quote=quote
Specifies the quoting character to be used when a data value is quoted.
The default is double-quote.
This must be a single one-byte character.
This option is allowed only when using CSV format.
-e escape--escape=escape
Specifies the character that should appear before a
data character that matches the quote value.
The default is the same as the quote value (so that
the quoting character is doubled if it appears in the data).
This must be a single one-byte character.
This option is allowed only when using CSV format.
-P--print-progressReport progress during data loading
--report-every-rows=num_rows
Specifies the number of rows between progress reports if
--print-progress is enabled
-f file_path--file-path=file_path
Specifies the path to the text or CSV file to load.
By default, shardman-loader reads
data from STDIN.
-F format--format=format
Specifies the format of the input data. Possible values are text
and csv. The default is text.
-h--helpShow the list of all available options
Assume that the primary of one of the replication groups is running on the
same physical node as shardman-loader on port 5432
and that the utility tries to connect to the postgres database with
the postgres role. This command loads data from
table_data.csv:
postgres$shardman-loader -P --file=/path/to/table_data.csv --format=csv 'port=5432 user=postgres dbname=postgres' table_nameDelim ',', fmt csv 100000 rows sent, 0 xacts performed, 0s elapsed 200000 rows sent, 0 xacts performed, 0s elapsed 200000 rows sent, 2 xacts performed, 0s elapsed All transactions were successfully prepared with gid containing "shmn_loader_large_test". Completed 2 xacts with 200000 rows. Proceeding to commit them... Done
COPY FROM