shardman-loader

shardman-loader — Shardman bulk-load utility

Synopsis

shardman-loader [options] coordinator_connstr table_name

Here options are:

[--num-conn=number_of_connections] [--rows-per-xact=rows_per_xact] [--no-twophase] [--delimiter=delimiter] [--quote=quote] [--escape=escape] [--print-progress] [--report-every-rows=number_of_rows] [--file-path=path_to_file] [--format= text | csv ] [--help]

Description

shardman-loader is a Shardman bulk-load utility. Use it for fast loading of text and CSV data into a Shardman cluster.

Internally, shardman-loader connects to the coordinator node using the provided connection string and retrieves information about all the other available nodes in the cluster. After that, shardman-loader opens multiple connections limited in number by --num-conn. By default, exactly one connection per replication group is opened. Although it ensures sufficient performance of data load in many cases, it is recommended that you try setting --num-conn to twice the number of nodes in the cluster to achieve more parallelism. Inside each opened connection, shardman-loader utilizes an optimized COPY FROM command to improve the speed of routing data between nodes.

For atomicity of data loading, shardman-loader uses the two-phase commit (2PC) protocol.

Options

shardman-loader accepts the following command-line options:

-c number_of_connections
--num-conn=number_of_connections

Specifies the number of connections to use for data loading. The default is the number of replication groups.

-r rows_per_xact
--rows-per-xact=rows_per_xact

Sets shardman-loader to commit the current transaction and start a new one for each rows_per_xact rows. By default, all rows in each connection are committed in a single transaction.

--no-twopahse

Turns off the use of the 2PC protocol

-d delimiter
--delimiter=delimiter

Specifies the character that separates columns within each row (line) of the file. The default is a tab character in text format and a comma in CSV format. This must be a single one-byte character.

-q quote
--quote=quote

Specifies the quoting character to be used when a data value is quoted. The default is double-quote. This must be a single one-byte character. This option is allowed only when using CSV format.

-e escape
--escape=escape

Specifies the character that should appear before a data character that matches the quote value. The default is the same as the quote value (so that the quoting character is doubled if it appears in the data). This must be a single one-byte character. This option is allowed only when using CSV format.

-P
--print-progress

Report progress during data loading

--report-every-rows=num_rows

Specifies the number of rows between progress reports if --print-progress is enabled

-f file_path
--file-path=file_path

Specifies the path to the text or CSV file to load. By default, shardman-loader reads data from STDIN.

-F format
--format=format

Specifies the format of the input data. Possible values are text and csv. The default is text.

-h
--help

Show the list of all available options

Examples

Loading Data from a CSV File

Assume that the primary of one of the replication groups is running on the same physical node as shardman-loader on port 5432 and that the utility tries to connect to the postgres database with the postgres role. This command loads data from table_data.csv:

postgres$ shardman-loader -P --file=/path/to/table_data.csv --format=csv 'port=5432 user=postgres dbname=postgres' table_name

Delim ',', fmt csv
100000 rows sent, 0 xacts performed, 0s elapsed
200000 rows sent, 0 xacts performed, 0s elapsed
200000 rows sent, 2 xacts performed, 0s elapsed
All transactions were successfully prepared with gid containing "shmn_loader_large_test". Completed 2 xacts with 200000 rows. Proceeding to commit them...
Done

See Also

COPY FROM