You can retrieve
Parquet files after
retrieving columns of an analytical table.
Retrieved Parquet
can be filtered using statistics from the
pga_file_column_statistics
metadata table.
To retrieve Parquet files and filter them by column values, execute the following query:
SELECT data_file_id
FROM ducklake_file_column_stats
WHERE
table_id = table_ID AND
column_id = column_ID AND
(SCALAR >= min_value OR min_value IS NULL) AND
(SCALAR <= max_value OR max_value IS NULL);
Where:
table_ID: The ID of the analytical table
from the
pga_table
metadata table associated with Parquet files.
column_ID: The ID of the column from the
pga_column
metadata table whose values are used for filtering Parquet files.
In this example, only Parquet files that do not contain
scalar values in the column_ID
column are retrieved.
You can filter column values using different conditions, such as "greater than (>)", by updating the query accordingly.
The minimum and maximum values of each column are stored as arrays and must be converted to integers.