> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-fix-nav-issues.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

> This engine provides a read-only integration with existing Apache Iceberg tables in Amazon S3, Azure, HDFS and locally stored tables.

# Iceberg table engine

<Warning>
  We recommend using the [Iceberg Table Function](/reference/functions/table-functions/iceberg) for working with Iceberg data in ClickHouse. The Iceberg Table Function currently provides sufficient functionality, offering a partial read-only interface for Iceberg tables.

  The Iceberg Table Engine is available but may have limitations. ClickHouse wasn't originally designed to support tables with externally changing schemas, which can affect the functionality of the Iceberg Table Engine. As a result, some features that work with regular tables may be unavailable or may not function correctly, especially when using the old analyzer.

  For optimal compatibility, we suggest using the Iceberg Table Function while we continue to improve support for the Iceberg Table Engine.
</Warning>

This engine provides a read-only integration with existing Apache [Iceberg](https://iceberg.apache.org/) tables in Amazon S3, Azure, HDFS and locally stored tables.

<h2 id="create-table">
  Create table
</h2>

Note that the Iceberg table must already exist in the storage, this command does not take DDL parameters to create a new table.

```sql theme={null}
CREATE TABLE iceberg_table_s3
    ENGINE = IcebergS3(url,  [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression], [,extra_credentials])

CREATE TABLE iceberg_table_azure
    ENGINE = IcebergAzure(connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression])

CREATE TABLE iceberg_table_hdfs
    ENGINE = IcebergHDFS(path_to_table, [,format] [,compression_method])

CREATE TABLE iceberg_table_local
    ENGINE = IcebergLocal(path_to_table, [,format] [,compression_method])
```

<h2 id="engine-arguments">
  Engine arguments
</h2>

Description of the arguments coincides with description of arguments in engines `S3`, `AzureBlobStorage`, `HDFS` and `File` correspondingly.
`format` stands for the format of data files in the Iceberg table.

For `IcebergS3`, an optional `extra_credentials` parameter can be used to pass a `role_arn` for role-based access in ClickHouse Cloud. See [Secure S3](/products/cloud/guides/data-sources/accessing-s3-data-securely) for configuration steps.

Engine parameters can be specified using [Named Collections](/concepts/features/configuration/server-config/named-collections)

<h3 id="example">
  Example
</h3>

```sql theme={null}
CREATE TABLE iceberg_table ENGINE=IcebergS3('http://test.s3.amazonaws.com/clickhouse-bucket/test_table', 'test', 'test')
```

Using named collections:

```xml theme={null}
<clickhouse>
    <named_collections>
        <iceberg_conf>
            <url>http://test.s3.amazonaws.com/clickhouse-bucket/</url>
            <access_key_id>test</access_key_id>
            <secret_access_key>test</secret_access_key>
        </iceberg_conf>
    </named_collections>
</clickhouse>
```

```sql theme={null}
CREATE TABLE iceberg_table ENGINE=IcebergS3(iceberg_conf, filename = 'test_table')

```

<h2 id="aliases">
  Aliases
</h2>

The `Iceberg` table engine auto-detects the storage backend from the `disk` setting and dispatches to `IcebergS3`, `IcebergAzure`, or `IcebergLocal` accordingly. When no `disk` is specified, it defaults to the `IcebergS3` implementation.

<h2 id="data-types">
  Data types
</h2>

The following table shows how Iceberg data types are mapped to ClickHouse data types during schema inference (for reading purposes).

<h3 id="primitive-types">
  Primitive types
</h3>

| Iceberg type       | ClickHouse type        | Notes                                             |
| ------------------ | ---------------------- | ------------------------------------------------- |
| `boolean`          | `Bool`                 |                                                   |
| `int`              | `Int32`                |                                                   |
| `long`, `bigint`   | `Int64`                |                                                   |
| `float`            | `Float32`              |                                                   |
| `double`           | `Float64`              |                                                   |
| `date`             | `Date32`               |                                                   |
| `time`             | `Int64`                | Microseconds since midnight                       |
| `timestamp`        | `DateTime64(6)`        | Microseconds, no timezone                         |
| `timestamptz`      | `DateTime64(6, 'UTC')` | Microseconds, UTC timezone                        |
| `timestamp_ns`     | `DateTime64(9)`        | Nanoseconds, no timezone (since Iceberg v3 only)  |
| `timestamptz_ns`   | `DateTime64(9, 'UTC')` | Nanoseconds, UTC timezone (since Iceberg v3 only) |
| `string`, `binary` | `String`               |                                                   |
| `uuid`             | `UUID`                 |                                                   |
| `fixed(N)`         | `FixedString(N)`       |                                                   |
| `decimal(P, S)`    | `Decimal(P, S)`        |                                                   |

<h3 id="complex-types">
  Complex types
</h3>

| Iceberg type | ClickHouse type |
| ------------ | --------------- |
| `list`       | `Array`         |
| `map`        | `Map`           |
| `struct`     | `Tuple`         |

<h2 id="schema-evolution">
  Schema evolution
</h2>

ClickHouse supports reading Iceberg tables whose schema has evolved over time. This includes tables where columns have been added, removed, or reordered, as well as columns changed from required to nullable. Additionally, the following type casts are supported:

* int -> long
* float -> double
* decimal(P, S) -> decimal(P', S) where P' > P.

Currently, it is not possible to change nested structures or the types of elements within arrays and maps.

To read a table where the schema has changed after its creation with dynamic schema inference, set allow\_dynamic\_metadata\_for\_data\_lakes = true when creating the table.

<h2 id="partition-pruning">
  Partition pruning
</h2>

ClickHouse supports partition pruning during SELECT queries for Iceberg tables, which helps optimize query performance by skipping irrelevant data files. To enable partition pruning, set `use_iceberg_partition_pruning = 1`. For more information about iceberg partition pruning address [https://iceberg.apache.org/spec/#partitioning](https://iceberg.apache.org/spec/#partitioning)

<h2 id="time-travel">
  Time travel
</h2>

ClickHouse supports time travel for Iceberg tables, allowing you to query historical data with a specific timestamp or snapshot ID.

<h2 id="deleted-rows">
  Processing of tables with deleted rows
</h2>

ClickHouse supports reading Iceberg tables that use the following deletion methods:

* [Position deletes](https://iceberg.apache.org/spec/#position-delete-files)
* [Equality deletes](https://iceberg.apache.org/spec/#equality-delete-files) (supported from version 25.8+)

The following deletion method is **not supported**:

* [Deletion vectors](https://iceberg.apache.org/spec/#deletion-vectors) (introduced in v3)

<h3 id="basic-usage">
  Basic usage
</h3>

```sql theme={null}
SELECT * FROM example_table ORDER BY 1 
SETTINGS iceberg_timestamp_ms = 1714636800000
```

```sql theme={null}
SELECT * FROM example_table ORDER BY 1 
SETTINGS iceberg_snapshot_id = 3547395809148285433
```

Note: You cannot specify both `iceberg_timestamp_ms` and `iceberg_snapshot_id` parameters in the same query.

<h3 id="important-considerations">
  Important considerations
</h3>

* **Snapshots** are typically created when:
  * New data is written to the table
  * Some kind of data compaction is performed

* **Schema changes typically don't create snapshots** - This leads to important behaviors when using time travel with tables that have undergone schema evolution.

<h3 id="example-scenarios">
  Example scenarios
</h3>

All scenarios are written in Spark because CH doesn't support writing to Iceberg tables yet.

<h4 id="scenario-1">
  Scenario 1: Schema changes without new snapshots
</h4>

Consider this sequence of operations:

```sql theme={null}
-- Create a table with two columns
 CREATE TABLE IF NOT EXISTS spark_catalog.db.time_travel_example (
 order_number int, 
 product_code string
 ) 
 USING iceberg 
 OPTIONS ('format-version'='2')

-- Insert data into the table
 INSERT INTO spark_catalog.db.time_travel_example VALUES 
   (1, 'Mars')

 ts1 = now() // A piece of pseudo code

-- Alter table to add a new column
 ALTER TABLE spark_catalog.db.time_travel_example ADD COLUMN (price double)

 ts2 = now()

-- Insert data into the table
 INSERT INTO spark_catalog.db.time_travel_example VALUES (2, 'Venus', 100)

  ts3 = now()

-- Query the table at each timestamp
 SELECT * FROM spark_catalog.db.time_travel_example TIMESTAMP AS OF ts1;

+------------+------------+
|order_number|product_code|
+------------+------------+
|           1|        Mars|
+------------+------------+
 SELECT * FROM spark_catalog.db.time_travel_example TIMESTAMP AS OF ts2;

+------------+------------+
|order_number|product_code|
+------------+------------+
|           1|        Mars|
+------------+------------+

 SELECT * FROM spark_catalog.db.time_travel_example TIMESTAMP AS OF ts3;

+------------+------------+-----+
|order_number|product_code|price|
+------------+------------+-----+
|           1|        Mars| NULL|
|           2|       Venus|100.0|
+------------+------------+-----+
```

Query results at different timestamps:

* At ts1 & ts2: Only the original two columns appear
* At ts3: All three columns appear, with NULL for the price of the first row

<h4 id="scenario-2">
  Scenario 2: Historical vs. current schema differences
</h4>

A time travel query at a current moment might show a different schema than the current table:

```sql theme={null}
-- Create a table
  CREATE TABLE IF NOT EXISTS spark_catalog.db.time_travel_example_2 (
  order_number int, 
  product_code string
  ) 
  USING iceberg 
  OPTIONS ('format-version'='2')

-- Insert initial data into the table
  INSERT INTO spark_catalog.db.time_travel_example_2 VALUES (2, 'Venus');

-- Alter table to add a new column
  ALTER TABLE spark_catalog.db.time_travel_example_2 ADD COLUMN (price double);

  ts = now();

-- Query the table at a current moment but using timestamp syntax

  SELECT * FROM spark_catalog.db.time_travel_example_2 TIMESTAMP AS OF ts;

    +------------+------------+
    |order_number|product_code|
    +------------+------------+
    |           2|       Venus|
    +------------+------------+

-- Query the table at a current moment
  SELECT * FROM spark_catalog.db.time_travel_example_2;
    +------------+------------+-----+
    |order_number|product_code|price|
    +------------+------------+-----+
    |           2|       Venus| NULL|
    +------------+------------+-----+
```

This happens because `ALTER TABLE` doesn't create a new snapshot but for the current table Spark takes value of `schema_id` from the latest metadata file, not a snapshot.

<h4 id="scenario-3">
  Scenario 3: Historical vs. current schema differences
</h4>

The second one is that while doing time travel you can't get state of table before any data was written to it:

```sql theme={null}
-- Create a table
  CREATE TABLE IF NOT EXISTS spark_catalog.db.time_travel_example_3 (
  order_number int, 
  product_code string
  ) 
  USING iceberg 
  OPTIONS ('format-version'='2');

  ts = now();

-- Query the table at a specific timestamp
  SELECT * FROM spark_catalog.db.time_travel_example_3 TIMESTAMP AS OF ts; -- Finises with error: Cannot find a snapshot older than ts.
```

In Clickhouse the behavior is consistent with Spark. You can mentally replace Spark Select queries with Clickhouse Select queries and it will work the same way.

<h2 id="metadata-file-resolution">
  Metadata file resolution
</h2>

When using the `Iceberg` table engine in ClickHouse, the system needs to locate the correct metadata.json file that describes the Iceberg table structure. Here's how this resolution process works:

<h3 id="candidate-search">
  Candidates search
</h3>

1. **Direct Path Specification**:

* If you set `iceberg_metadata_file_path`, the system will use this exact path by combining it with the Iceberg table directory path.
* When this setting is provided, all other resolution settings are ignored.

2. **Table UUID Matching**:

* If `iceberg_metadata_table_uuid` is specified, the system will:
  * Look only at `.metadata.json` files in the `metadata` directory
  * Filter for files containing a `table-uuid` field matching your specified UUID (case-insensitive)

3. **Default Search**:

* If neither of the above settings are provided, all `.metadata.json` files in the `metadata` directory become candidates

<h3 id="most-recent-file">
  Selecting the most recent file
</h3>

After identifying candidate files using the above rules, the system determines which one is the most recent:

* If `iceberg_recent_metadata_file_by_last_updated_ms_field` is enabled:
  * The file with the largest `last-updated-ms` value is selected

* Otherwise:
  * The file with the highest version number is selected
  * (Version appears as `V` in filenames formatted as `V.metadata.json` or `V-uuid.metadata.json`)

**Note**: All mentioned settings (unless explicitly specified otherwise) are engine-level settings and must be specified during table creation as shown below:

```sql theme={null}
CREATE TABLE example_table ENGINE = Iceberg(
    's3://bucket/path/to/iceberg_table'
) SETTINGS iceberg_metadata_table_uuid = '6f6f6407-c6a5-465f-a808-ea8900e35a38';
```

**Note**: While Iceberg Catalogs typically handle metadata resolution, the `Iceberg` table engine in ClickHouse directly interprets files stored in S3 as Iceberg tables, which is why understanding these resolution rules is important.

<h2 id="data-cache">
  Data cache
</h2>

`Iceberg` table engine and table function support data caching same as `S3`, `AzureBlobStorage`, `HDFS` storages. See [here](/reference/engines/table-engines/integrations/s3#data-cache).

<h2 id="metadata-cache">
  Metadata cache
</h2>

`Iceberg` table engine and table function support metadata cache storing the information of manifest files, manifest list and metadata json. The cache is stored in memory. This feature is controlled by setting `use_iceberg_metadata_files_cache`, which is enabled by default.

<h2 id="async-metadata-prefetch">
  Asynchronous metadata prefetching
</h2>

Asynchronous metadata prefetching can be enabled at `Iceberg` table creation by setting `iceberg_metadata_async_prefetch_period_ms`. If set to 0 (default) or if metadata caching is not enabled, the asynchronous prefetching is disabled.
In order to enable this feature, a non-zero value of milliseconds should be given. It represents interval between prefetching cycles.

If enabled, the server will run a recurring background operation to list the remote catalog and to detect new metadata version. It will then parse it and recursively walk the snapshot, fetching active manifest list files and manifest files.
The files already available at the metadata cache, won't be downloaded again. At the end of each prefetching cycle, the latest metadata snapshot is available at the metadata cache.

```sql theme={null}
CREATE TABLE example_table ENGINE = Iceberg(
    's3://bucket/path/to/iceberg_table'
) SETTINGS
    iceberg_metadata_async_prefetch_period_ms = 60000;
```

In order to make the most of asynchronous metadata prefetching at read operations, `iceberg_metadata_staleness_ms` parameter should be specified as Query or Session parameter. By default (0 - not specified) in the context of each query, the server will fetch latest metadata from the remote catalog.
By specifying tolerance to metadata staleness, the server is allowed to use the cached version of metadata snapshot without calling the remote catalog. If there's metadata version in cache, and it has been downloaded within the given window of staleness, it will be used to process the query.
Otherwise the latest version will be fetched from the remote catalog.

```sql theme={null}
SELECT count() FROM icebench_table WHERE ...
SETTINGS iceberg_metadata_staleness_ms=120000
```

**Note**: Asynchronous metadata prefetching runs at `ICEBERG_SCEDULE_POOL`, which is server-side threadpool for background operations on active `Iceberg` tables. The size of this threadpool is controlled by `iceberg_background_schedule_pool_size` server configuration parameter (default is 10).

**Note**: Current expectation is that metadata cache size is sufficient to hold the latest metadata snapshot in full for all active tables, if asynchronous prefetching is enabled.

<h2 id="see-also">
  See also
</h2>

* [iceberg table function](/reference/functions/table-functions/iceberg)
