You can specify the following Parquet file storage parameters in a JSON file,
and apply them when executing the metastore.add_files
or metastore.copy_table
stored procedure:
compression: The data compression algorithm.
Possible values:
snappy
zstd
gzip
lz4/lz4_raw
brotli
uncompressed
compression_level: The data compression level.
Possible values are from 1 to 22.
Default value: 3.
Optional parameter. It is ignored if any compression algorithm other than zstd is used.
row_group_size: The maximum number of rows in a
row group.
The larger the value, the better the compression. The
smaller the value, the more threads are used when reading Parquet files,
and the better the statistics filtering.
Minimal value: 2048.
Default value: 122_880.
Recommended value range is from 100_000 to
1_000_000.
Example 29.3.
{
"compression": "zstd",
"compression_level": 9,
"row_group_size": 500000
}