> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-fix-nav-issues.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

> Integrate ClickHouse with Databricks

# Integrating ClickHouse with Databricks

export const ClickHouseSupportedBadge = () => {
  return <div className="ClickHouseSupportedBadge">
            <div className="ClickHouseSupportedIcon">
                <svg width="16" height="16" viewBox="0 0 16 16" fill="none" xmlns="http://www.w3.org/2000/svg">
                    <path d="M1.30762 1.39073C1.30762 1.3103 1.37465 1.22986 1.46849 1.22986H2.64824C2.72868 1.22986 2.80912 1.29689 2.80912 1.39073V14.4886C2.80912 14.5691 2.74209 14.6495 2.64824 14.6495H1.46849C1.38805 14.6495 1.30762 14.5825 1.30762 14.4886V1.39073Z" fill="currentColor" />
                    <path d="M4.2832 1.39073C4.2832 1.3103 4.35023 1.22986 4.44408 1.22986H5.62383C5.70427 1.22986 5.7847 1.29689 5.7847 1.39073V14.4886C5.7847 14.5691 5.71767 14.6495 5.62383 14.6495H4.44408C4.36364 14.6495 4.2832 14.5825 4.2832 14.4886V1.39073Z" fill="currentColor" />
                    <path d="M7.25977 1.39073C7.25977 1.3103 7.3268 1.22986 7.42064 1.22986H8.60039C8.68083 1.22986 8.76127 1.29689 8.76127 1.39073V14.4886C8.76127 14.5691 8.69423 14.6495 8.60039 14.6495H7.42064C7.3402 14.6495 7.25977 14.5825 7.25977 14.4886V1.39073Z" fill="currentColor" />
                    <path d="M10.2354 1.39073C10.2354 1.3103 10.3024 1.22986 10.3962 1.22986H11.576C11.6564 1.22986 11.7369 1.29689 11.7369 1.39073V14.4886C11.7369 14.5691 11.6698 14.6495 11.576 14.6495H10.3962C10.3158 14.6495 10.2354 14.5825 10.2354 14.4886V1.39073Z" fill="currentColor" />
                    <path d="M13.2256 6.6057C13.2256 6.52526 13.2926 6.44482 13.3865 6.44482H14.5662C14.6466 6.44482 14.7271 6.51186 14.7271 6.6057V9.27354C14.7271 9.35398 14.6601 9.43442 14.5662 9.43442H13.3865C13.306 9.43442 13.2256 9.36739 13.2256 9.27354V6.6057Z" fill="currentColor" />
                </svg>
            </div>
            ClickHouse Supported
        </div>;
};

export const Image = ({img, alt, size}) => {
  return <Frame>
      <img src={img} alt={alt} />
    </Frame>;
};

The ClickHouse Spark connector works seamlessly with Databricks. This guide covers platform-specific setup, installation, and usage patterns for Databricks.

<h2 id="api-selection">
  API Selection for Databricks
</h2>

By default, Databricks uses Unity Catalog, which blocks Spark catalog registration. In this case, you **must** use the **TableProvider API** (format-based access).

However, if you disable Unity Catalog by creating a cluster with **No isolation shared** access mode, you can use the **Catalog API** instead. The Catalog API provides centralized configuration and native Spark SQL integration.

| Unity Catalog Status               | Recommended API                  | Notes                                                   |
| ---------------------------------- | -------------------------------- | ------------------------------------------------------- |
| **Enabled** (default)              | TableProvider API (format-based) | Unity Catalog blocks Spark catalog registration         |
| **Disabled** (No isolation shared) | Catalog API                      | Requires cluster with "No isolation shared" access mode |

<h2 id="installation">
  Installation on Databricks
</h2>

<h3 id="installation-ui">
  Option 1: Upload JAR via Databricks UI
</h3>

1. Build or [download](https://repo1.maven.org/maven2/com/clickhouse/spark/) the runtime JAR:
   ```bash theme={null}
   clickhouse-spark-runtime-{{ spark_binary_version }}_{{ scala_binary_version }}-{{ stable_version }}.jar
   ```

2. Upload the JAR to your Databricks workspace:
   * Go to **Workspace** → Navigate to your desired folder
   * Click **Upload** → Select the JAR file
   * The JAR will be stored in your workspace

3. Install the library on your cluster:
   * Go to **Compute** → Select your cluster
   * Click the **Libraries** tab
   * Click **Install New**
   * Select **DBFS** or **Workspace** → Navigate to the uploaded JAR file
   * Click **Install**

<Image img={require('@site/images/integrations/data-ingestion/apache-spark/databricks/databricks-libraries-tab.png')} alt="Databricks Libraries tab" />

<Image img={require('@site/images/integrations/data-ingestion/apache-spark/databricks/databricks-install-from-volume.png')} alt="Installing library from workspace volume" />

4. Restart the cluster to load the library

<h3 id="installation-cli">
  Option 2: Install via Databricks CLI
</h3>

```bash theme={null}
# Upload JAR to DBFS
databricks fs cp clickhouse-spark-runtime-{{ spark_binary_version }}_{{ scala_binary_version }}-{{ stable_version }}.jar \
  dbfs:/FileStore/jars/

# Install on cluster
databricks libraries install \
  --cluster-id <your-cluster-id> \
  --jar dbfs:/FileStore/jars/clickhouse-spark-runtime-{{ spark_binary_version }}_{{ scala_binary_version }}-{{ stable_version }}.jar
```

<h3 id="installation-maven">
  Option 3: Maven Coordinates (Recommended)
</h3>

1. Navigate to your Databricks workspace:
   * Go to **Compute** → Select your cluster
   * Click the **Libraries** tab
   * Click **Install New**
   * Select **Maven** tab

2. Add the Maven coordinates:

```text theme={null}
com.clickhouse.spark:clickhouse-spark-runtime-{{ spark_binary_version }}_{{ scala_binary_version }}:{{ stable_version }}
```

<Image img={require('@site/images/integrations/data-ingestion/apache-spark/databricks/databricks-maven-tab.png')} alt="Databricks Maven libraries configuration" />

3. Click **Install** and restart the cluster to load the library

<h2 id="tableprovider-api">
  Using TableProvider API
</h2>

When Unity Catalog is enabled (default), you **must** use the TableProvider API (format-based access) because Unity Catalog blocks Spark catalog registration. If you've disabled Unity Catalog by using a cluster with "No isolation shared" access mode, you can use the [Catalog API](/integrations/connectors/data-ingestion/apache-spark/spark-native-connector#register-the-catalog-required) instead.

<h3 id="reading-data-table-provider">
  Reading data
</h3>

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Read from ClickHouse using TableProvider API
    df = spark.read \
        .format("clickhouse") \
        .option("host", "your-clickhouse-cloud-host.clickhouse.cloud") \
        .option("protocol", "https") \
        .option("http_port", "8443") \
        .option("database", "default") \
        .option("table", "events") \
        .option("user", "default") \
        .option("password", dbutils.secrets.get(scope="clickhouse", key="password")) \
        .option("ssl", "true") \
        .load()

    # Schema is automatically inferred
    df.display()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={null}
    val df = spark.read
      .format("clickhouse")
      .option("host", "your-clickhouse-cloud-host.clickhouse.cloud")
      .option("protocol", "https")
      .option("http_port", "8443")
      .option("database", "default")
      .option("table", "events")
      .option("user", "default")
      .option("password", dbutils.secrets.get(scope="clickhouse", key="password"))
      .option("ssl", "true")
      .load()

    df.show()
    ```
  </Tab>
</Tabs>

<h3 id="writing-data-unity">
  Writing data
</h3>

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Write to ClickHouse - table will be created automatically if it doesn't exist
    df.write \
        .format("clickhouse") \
        .option("host", "your-clickhouse-cloud-host.clickhouse.cloud") \
        .option("protocol", "https") \
        .option("http_port", "8443") \
        .option("database", "default") \
        .option("table", "events_copy") \
        .option("user", "default") \
        .option("password", dbutils.secrets.get(scope="clickhouse", key="password")) \
        .option("ssl", "true") \
        .option("order_by", "id") \  # Required: specify ORDER BY when creating a new table
        .option("settings.allow_nullable_key", "1") \  # Required for ClickHouse Cloud if ORDER BY has nullable columns
        .mode("append") \
        .save()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={null}
    df.write
      .format("clickhouse")
      .option("host", "your-clickhouse-cloud-host.clickhouse.cloud")
      .option("protocol", "https")
      .option("http_port", "8443")
      .option("database", "default")
      .option("table", "events_copy")
      .option("user", "default")
      .option("password", dbutils.secrets.get(scope="clickhouse", key="password"))
      .option("ssl", "true")
      .option("order_by", "id")  // Required: specify ORDER BY when creating a new table
      .option("settings.allow_nullable_key", "1")  // Required for ClickHouse Cloud if ORDER BY has nullable columns
      .mode("append")
      .save()
    ```
  </Tab>
</Tabs>

<Note>
  This example assumes preconfigured secret scopes in Databricks. For setup instructions, see the Databricks [Secret management documentation](https://docs.databricks.com/aws/en/security/secrets/).
</Note>

<h2 id="considerations">
  Databricks-specific considerations
</h2>

<h3 id="access-mode">
  Access mode requirements
</h3>

The ClickHouse Spark Connector requires **Dedicated** (formerly Single User) access mode. **Standard** (formerly Shared) access mode isn't supported when Unity Catalog is enabled, as Databricks blocks external DataSource V2 connectors in that configuration.

| Access Mode             | Unity Catalog | Supported |
| ----------------------- | ------------- | --------- |
| Dedicated (Single User) | Enabled       | ✅ Yes     |
| Dedicated (Single User) | Disabled      | ✅ Yes     |
| Standard (Shared)       | Enabled       | ❌ No      |
| Standard (Shared)       | Disabled      | ✅ Yes     |

<h3 id="secret-management">
  Secret management
</h3>

Use Databricks secret scopes to securely store ClickHouse credentials:

```python theme={null}
# Access secrets
password = dbutils.secrets.get(scope="clickhouse", key="password")
```

For setup instructions, see the Databricks [Secret management documentation](https://docs.databricks.com/aws/en/security/secrets/).

<h3 id="clickhouse-cloud">
  ClickHouse Cloud connection
</h3>

When connecting to ClickHouse Cloud from Databricks:

1. Use **HTTPS protocol** (`protocol: https`, `http_port: 8443`)
2. Enable **SSL** (`ssl: true`)

<h2 id="examples">
  Examples
</h2>

<h3 id="workflow-example">
  Complete workflow example
</h3>

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    from pyspark.sql import SparkSession
    from pyspark.sql.functions import col

    # Initialize Spark with ClickHouse connector
    spark = SparkSession.builder \
        .config("spark.jars.packages", "com.clickhouse.spark:clickhouse-spark-runtime-3.4_2.12:0.9.0") \
        .getOrCreate()

    # Read from ClickHouse
    df = spark.read \
        .format("clickhouse") \
        .option("host", "your-host.clickhouse.cloud") \
        .option("protocol", "https") \
        .option("http_port", "8443") \
        .option("database", "default") \
        .option("table", "source_table") \
        .option("user", "default") \
        .option("password", dbutils.secrets.get(scope="clickhouse", key="password")) \
        .option("ssl", "true") \
        .load()

    # Transform data
    transformed_df = df.filter(col("status") == "active")

    # Write to ClickHouse
    transformed_df.write \
        .format("clickhouse") \
        .option("host", "your-host.clickhouse.cloud") \
        .option("protocol", "https") \
        .option("http_port", "8443") \
        .option("database", "default") \
        .option("table", "target_table") \
        .option("user", "default") \
        .option("password", dbutils.secrets.get(scope="clickhouse", key="password")) \
        .option("ssl", "true") \
        .option("order_by", "id") \
        .mode("append") \
        .save()
    ```
  </Tab>

  <Tab title="Scala">
    ```scala theme={null}
    import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.functions.col

    // Initialize Spark with ClickHouse connector
    val spark = SparkSession.builder
      .config("spark.jars.packages", "com.clickhouse.spark:clickhouse-spark-runtime-3.4_2.12:0.9.0")
      .getOrCreate()

    // Read from ClickHouse
    val df = spark.read
      .format("clickhouse")
      .option("host", "your-host.clickhouse.cloud")
      .option("protocol", "https")
      .option("http_port", "8443")
      .option("database", "default")
      .option("table", "source_table")
      .option("user", "default")
      .option("password", dbutils.secrets.get(scope="clickhouse", key="password"))
      .option("ssl", "true")
      .load()

    // Transform data
    val transformedDF = df.filter(col("status") === "active")

    // Write to ClickHouse
    transformedDF.write
      .format("clickhouse")
      .option("host", "your-host.clickhouse.cloud")
      .option("protocol", "https")
      .option("http_port", "8443")
      .option("database", "default")
      .option("table", "target_table")
      .option("user", "default")
      .option("password", dbutils.secrets.get(scope="clickhouse", key="password"))
      .option("ssl", "true")
      .option("order_by", "id")
      .mode("append")
      .save()
    ```
  </Tab>
</Tabs>

<h2 id="related">
  Related documentation
</h2>

* [Spark Native Connector Guide](/integrations/connectors/data-ingestion/apache-spark/spark-native-connector) - Complete connector documentation
* [TableProvider API Documentation](/integrations/connectors/data-ingestion/apache-spark/spark-native-connector#using-the-tableprovider-api) - Format-based access details
* [Catalog API Documentation](/integrations/connectors/data-ingestion/apache-spark/spark-native-connector#register-the-catalog-required) - Catalog-based access details
