21.1. Functions for Reading OLAP Data from a Storage

21.1. Functions for Reading OLAP Data from a Storage
Prev	Up	Chapter 21. Functions	Home	Next

21.1. Functions for Reading OLAP Data from a Storage #

read_parquet(path TEXT or TEXT[], optional_parameters) returns SETOF duckdb.row #

Reads a Parquet file from a storage.

Returns DuckDB rows. You can expand these rows using * or select specific columns using the r['column_name'] syntax. To select specific columns, give the function call an easy alias, such as r:

    SELECT * FROM read_parquet('file.parquet');
    SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;
    SELECT COUNT(*) FROM read_parquet('file.parquet');

Required parameters:

Name	Type	Description
`path`	`text` or `text[]`	The path to the storage with Parquet files to read. It can be a glob or array of files.

Optional parameters mirror the DuckDB read_parquet function. To specify optional parameters, use parameter := 'value'.

read_csv(path TEXT or TEXT[], optional_parameters) returns SETOF duckdb.row #

Reads a CSV file from a storage.

  SELECT * FROM read_csv('file.csv');
  SELECT r['id'], r['name'] FROM read_csv('file.csv') r WHERE r['age'] > 21;
  SELECT COUNT(*) FROM read_csv('file.csv');

Required parameters:

Name	Type	Description
`path`	`text` or `text[]`	The path to the storage with CSV files to read. It can be a glob or array of files.

Optional parameters mirror the DuckDB read_csv function. To specify optional parameters, use parameter := 'value'.

Note

columns is not currently supported
nullstr must be an array (TEXT[])

read_json(path TEXT or TEXT[], optional_parameters) returns SETOF duckdb.row #

Reads a JSON file from a storage.

Example 21.1.

    SELECT * FROM read_parquet('file.parquet');
    SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;
    SELECT COUNT(*) FROM read_parquet('file.parquet');

Required parameters:

Name	Type	Description
`path`	`text` or `text[]`	The path to the storage with JSON files to read. It can be a glob or array of files.

Optional parameters mirror the DuckDB read_json function. To specify optional parameters, use parameter := 'value'.

Note

columns is not currently supported.

iceberg_scan(path TEXT, optional_parameters) returns SETOF duckdb.row #

Reads an iceberg table from a storage.

Example 21.2.

  SELECT * FROM iceberg_scan('data/iceberg/table');
  SELECT r['id'], r['name'] FROM iceberg_scan('data/iceberg/table') r WHERE r['age'] > 21;
  SELECT COUNT(*) FROM iceberg_scan('data/iceberg/table');

To use iceberg_scan, enable the iceberg extension:

  SELECT duckdb.install_extension('iceberg');

Required parameters:

Name	Type	Description
`path`	`text`	The path to the storage with the iceberg table to read.

Optional parameters mirror the DuckDB iceberg_scan function. To specify optional parameters, use parameter := 'value'.

Optional parameters:

Name	Type	Default value	Description
`allowed_moved_paths`	`boolean`	false	Ensures that path resolution is performed, which allows scanning iceberg tables that are moved.
`mode`	`text`	''	—
`metadata_compression_codec`	`text`	`'none'`	—
`skip_schema_inference`	`boolean`	false	—
`version`	`text`	`'version-hint.text'`	—
`version_name_format`	`text`	`'v%s%s.metadata.json,%s%s.metadata.json'`	—

iceberg_metadata(path TEXT, optional_parameters) returns SETOF iceberg_metadata_record #

Returns metadata of an iceberg table. Metadata is returned as a set of icerberg_metadata_record, which is defined as follows:

  CREATE TYPE duckdb.iceberg_metadata_record AS (
    manifest_path TEXT,
    manifest_sequence_number NUMERIC,
    manifest_content TEXT,
    status TEXT,
    content TEXT,
    file_path TEXT
  );

To use iceberg_metadata, enable the iceberg extension:

  SELECT duckdb.install_extension('iceberg');

Required parameters:

Name	Type	Description
`path`	`text`	The path to the storage with the iceberg table to read.

Optional parameters mirror the DuckDB iceberg_metadata function. To specify optional parameters, use parameter := 'value'.

Optional parameters:

Name	Type	Default value	Description
`allowed_moved_paths`	`boolean`	false	Ensures that path resolution is performed, which allows scanning iceberg tables that are moved.
`metadata_compression_codec`	`text`	`'none'`	—
`skip_schema_inference`	`boolean`	false	—
`version`	`text`	`'version-hint.text'`	—
`version_name_format`	`text`	`'v%s%s.metadata.json,%s%s.metadata.json'`

iceberg_snapshots (path TEXT, optional_parameters) returns SETOF iceberg_snapshot_record #

Reads snapshot information from an iceberg table.

Returns snapshot metadata for an iceberg table, which can be useful for "time travel" queries and understanding table history.

Example 21.3.

  SELECT * FROM iceberg_snapshots('data/iceberg/table');

To use iceberg_snapshots, enable the iceberg extension:

  SELECT duckdb.install_extension('iceberg');

Required parameters:

Name	Type	Description
`path`	`text`	The path to the storage with the iceberg table to read.

Optional parameters mirror the DuckDB iceberg_snapshots function. To specify optional parameters, use parameter := 'value'.

Optional parameters:

Name	Type	Default value
`metadata_compression_codec`	`text`	`'none'`
`skip_schema_inference`	`boolean`	false
`version`	`text`	`'version-hint.text'`
`version_name_format`	`text`	`'v%s%s.metadata.json,%s%s.metadata.json'`

delta_scan(path TEXT) returns SETOF duckdb.row #

Reads a Delta dataset from a storage.

Example 21.4.

  SELECT * FROM delta_scan('/path/to/delta/dataset');
  SELECT r['id'], r['name'] FROM delta_scan('/path/to/delta/dataset') r WHERE r['age'] > 21;
  SELECT COUNT(*) FROM delta_scan('/path/to/delta/dataset');

To use delta_scan, enable the delta extension:

  SELECT duckdb.install_extension('delta');

Required parameters:

Name	Type	Description
`path`	`text`	The path to the storage with the Delta dataset to read.