> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-fix-nav-issues.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

> Google Cloud Storage (GCS) Backed MergeTree

# Integrate Google Cloud Storage with ClickHouse

export const Image = ({img, alt, size}) => {
  return <Frame>
      <img src={img} alt={alt} />
    </Frame>;
};

<Note>
  If you're using ClickHouse Cloud on [Google Cloud](https://cloud.google.com), this page doesn't apply as your services will already be using [Google Cloud Storage](https://cloud.google.com/storage). If you're looking to `SELECT` or `INSERT` data from GCS, please see the [`gcs` table function](/reference/functions/table-functions/gcs).
</Note>

ClickHouse recognizes that GCS represents an attractive storage solution if you're seeking to separate storage and compute. To help achieve this, support is provided for using GCS as the storage for a MergeTree engine. This will enable you to exploit the scalability and cost benefits of GCS, and the insert and query performance of the MergeTree engine.

<h2 id="gcs-backed-mergetree">
  GCS backed MergeTree
</h2>

<h3 id="creating-a-disk">
  Creating a disk
</h3>

To utilize a GCS bucket as a disk, we must first declare it within the ClickHouse configuration in a file under `conf.d`. An example of a GCS disk declaration is shown below.  This configuration includes multiple sections to configure the GCS "disk", the cache, and the policy that is specified in DDL queries when tables are to be created on the GCS disk.  Each of these are described below.

<h4 id="storage_configuration--disks--gcs">
  Storage configuration > disks > gcs
</h4>

This part of the configuration is shown in the highlighted section and specifies that:

* The type of the disk is `s3` because the S3 API is in use.
* The endpoint as provided by GCS
* The service account HMAC key and secret
* The metadata path on the local disk

```xml highlight={5-10} theme={null}
<clickhouse>
    <storage_configuration>
        <disks>
            <gcs>
                <support_batch_delete>true</support_batch_delete>
                <type>s3</type>
                <endpoint>https://storage.googleapis.com/BUCKET NAME/FOLDER NAME/</endpoint>
                <access_key_id>SERVICE ACCOUNT HMAC KEY</access_key_id>
                <secret_access_key>SERVICE ACCOUNT HMAC SECRET</secret_access_key>
                <metadata_path>/var/lib/clickhouse/disks/gcs/</metadata_path>
            </gcs>
        </disks>
        <policies>
            <gcs_main>
                <volumes>
                    <main>
                        <disk>gcs</disk>
                    </main>
                </volumes>
            </gcs_main>
        </policies>
    </storage_configuration>
</clickhouse>
```

<h4 id="storage_configuration--disks--cache">
  Storage configuration > disks > cache
</h4>

The example configuration highlighted below enables a 10Gi memory cache for the disk `gcs`.

```xml highlight={12-17} theme={null}
<clickhouse>
    <storage_configuration>
        <disks>
            <gcs>
                <support_batch_delete>true</support_batch_delete>
                <type>s3</type>
                <endpoint>https://storage.googleapis.com/BUCKET NAME/FOLDER NAME/</endpoint>
                <access_key_id>SERVICE ACCOUNT HMAC KEY</access_key_id>
                <secret_access_key>SERVICE ACCOUNT HMAC SECRET</secret_access_key>
                <metadata_path>/var/lib/clickhouse/disks/gcs/</metadata_path>
            </gcs>
            <gcs_cache>
                <type>cache</type>
                <disk>gcs</disk>
                <path>/var/lib/clickhouse/disks/gcs_cache/</path>
                <max_size>10Gi</max_size>
            </gcs_cache>
        </disks>
        <policies>
            <gcs_main>
                <volumes>
                    <main>
                        <disk>gcs_cache</disk>
                    </main>
                </volumes>
            </gcs_main>
        </policies>
    </storage_configuration>
</clickhouse>
```

<h4 id="storage_configuration--policies--gcs_main">
  Storage configuration > policies > gcs\_main
</h4>

Storage configuration policies allow choosing where data is stored.  The policy highlighted below allows data to be stored on the disk `gcs` by specifying the policy `gcs_main`.  For example, `CREATE TABLE ... SETTINGS storage_policy='gcs_main'`.

```xml highlight={14-20} theme={null}
<clickhouse>
    <storage_configuration>
        <disks>
            <gcs>
                <support_batch_delete>true</support_batch_delete>
                <type>s3</type>
                <endpoint>https://storage.googleapis.com/BUCKET NAME/FOLDER NAME/</endpoint>
                <access_key_id>SERVICE ACCOUNT HMAC KEY</access_key_id>
                <secret_access_key>SERVICE ACCOUNT HMAC SECRET</secret_access_key>
                <metadata_path>/var/lib/clickhouse/disks/gcs/</metadata_path>
            </gcs>
        </disks>
        <policies>
            <gcs_main>
                <volumes>
                    <main>
                        <disk>gcs</disk>
                    </main>
                </volumes>
            </gcs_main>
        </policies>
    </storage_configuration>
</clickhouse>
```

A complete list of settings relevant to this disk declaration can be found [here](/reference/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-s3).

<h3 id="creating-a-table">
  Creating a table
</h3>

Assuming you have configured your disk to use a bucket with write access, you should be able to create a table such as in the example below. For purposes of brevity, we use a subset of the NYC taxi columns and stream data directly to the GCS-backed table:

```sql highlight={20} theme={null}
CREATE TABLE trips_gcs
(
   `trip_id` UInt32,
   `pickup_date` Date,
   `pickup_datetime` DateTime,
   `dropoff_datetime` DateTime,
   `pickup_longitude` Float64,
   `pickup_latitude` Float64,
   `dropoff_longitude` Float64,
   `dropoff_latitude` Float64,
   `passenger_count` UInt8,
   `trip_distance` Float64,
   `tip_amount` Float32,
   `total_amount` Float32,
   `payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(pickup_date)
ORDER BY pickup_datetime
SETTINGS storage_policy='gcs_main'
```

```sql theme={null}
INSERT INTO trips_gcs SELECT trip_id, pickup_date, pickup_datetime, dropoff_datetime, pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude, passenger_count, trip_distance, tip_amount, total_amount, payment_type FROM s3('https://ch-nyc-taxi.s3.eu-west-3.amazonaws.com/tsv/trips_{0..9}.tsv.gz', 'TabSeparatedWithNames') LIMIT 1000000;
```

Depending on the hardware, this latter insert of 1m rows may take a few minutes to execute. You can confirm the progress via the system.processes table. Feel free to adjust the row count up to the limit of 10m and explore some sample queries.

```sql theme={null}
SELECT passenger_count, avg(tip_amount) AS avg_tip, avg(total_amount) AS avg_amount FROM trips_gcs GROUP BY passenger_count;
```

<h3 id="handling-replication">
  Handling replication
</h3>

Replication with GCS disks can be accomplished by using the `ReplicatedMergeTree` table engine.  See the [replicating a single shard across two GCP regions using GCS](#gcs-multi-region) guide for details.

<h3 id="learn-more">
  Learn more
</h3>

The [Cloud Storage XML API](https://cloud.google.com/storage/docs/xml-api/overview) is interoperable with some tools and libraries that work with services such as Amazon Simple Storage Service (Amazon S3).

For further information on tuning threads, see [Optimizing for Performance](/integrations/connectors/data-ingestion/AWS/integrating-s3-with-clickhouse#s3-optimizing-performance).

<h2 id="gcs-multi-region">
  Using Google Cloud Storage (GCS)
</h2>

<Tip>
  Object storage is used by default in ClickHouse Cloud, you don't need to follow this procedure if you're running in ClickHouse Cloud.
</Tip>

<h3 id="plan-the-deployment">
  Plan the deployment
</h3>

This tutorial is written to describe a replicated ClickHouse deployment running in Google Cloud and using Google Cloud Storage (GCS) as the ClickHouse storage disk "type".

In the tutorial, you will deploy ClickHouse server nodes in Google Cloud Engine VMs, each with an associated GCS bucket for storage.  Replication is coordinated by a set of ClickHouse Keeper nodes, also deployed as VMs.

Sample requirements for high availability:

* Two ClickHouse server nodes, in two GCP regions
* Two GCS buckets, deployed in the same regions as the two ClickHouse server nodes
* Three ClickHouse Keeper nodes, two of them are deployed in the same regions as the ClickHouse server nodes. The third can be in the same region as one of the first two Keeper nodes, but in a different availability zone.

ClickHouse Keeper requires two nodes to function, hence a requirement for three nodes for high availability.

<h3 id="prepare-vms">
  Prepare virtual machines
</h3>

Deploy five VMS in three regions:

| Region | ClickHouse Server | Bucket              | ClickHouse Keeper |
| ------ | ----------------- | ------------------- | ----------------- |
| 1      | `chnode1`         | `bucket_regionname` | `keepernode1`     |
| 2      | `chnode2`         | `bucket_regionname` | `keepernode2`     |
| 3 `*`  |                   |                     | `keepernode3`     |

`*` This can be a different availability zone in the same region as 1 or 2.

<h4 id="deploy-clickhouse">
  Deploy ClickHouse
</h4>

Deploy ClickHouse on two hosts, in the sample configurations these are named `chnode1`, `chnode2`.

Place `chnode1` in one GCP region, and `chnode2` in a second.  In this guide `us-east1` and `us-east4` are used for the compute engine VMs, and also for GCS buckets.

<Note>
  Don't start `clickhouse server` until after it is configured.  Just install it.
</Note>

Refer to the [installation instructions](/get-started/setup/install) when performing the deployment steps on the ClickHouse server nodes.

<h4 id="deploy-clickhouse-keeper">
  Deploy ClickHouse Keeper
</h4>

Deploy ClickHouse Keeper on three hosts, in the sample configurations these are named `keepernode1`, `keepernode2`, and `keepernode3`.  `keepernode1` can be deployed in the same region as `chnode1`, `keepernode2` with `chnode2`, and `keepernode3` in either region, but in a different availability zone from the ClickHouse node in that region.

Refer to the [installation instructions](/get-started/setup/install) when performing the deployment steps on the ClickHouse Keeper nodes.

<h3 id="create-two-buckets">
  Create two buckets
</h3>

The two ClickHouse servers will be located in different regions for high availability.  Each will have a GCS bucket in the same region.

In **Cloud Storage > Buckets** choose **CREATE BUCKET**. For this tutorial two buckets are created, one in each of `us-east1` and `us-east4`.  The buckets are single region, standard storage class, and not public.  When prompted, enable public access prevention.  Don't create folders, they will be created when ClickHouse writes to the storage.

If you need step-by-step instructions to create buckets and an HMAC key, then expand **Create GCS buckets and an HMAC key** and follow along:

<Accordion title="Create GCS buckets and an HMAC key">
  <h3 id="ch_bucket_us_east1">
    ch\_bucket\_us\_east1
  </h3>

  <Image size="md" img="https://mintcdn.com/private-7c7dfe99-fix-nav-issues/4Hi2sd8mn4aAdMgN/images/integrations/data-ingestion/s3/GCS-bucket-1.png?fit=max&auto=format&n=4Hi2sd8mn4aAdMgN&q=85&s=e0c0ad9fba7b427581e3f4b028596c6d" alt="Creating a GCS bucket in US East 1" border width="1437" height="387" data-path="images/integrations/data-ingestion/s3/GCS-bucket-1.png" />

  <h3 id="ch_bucket_us_east4">
    ch\_bucket\_us\_east4
  </h3>

  <Image size="md" img="https://mintcdn.com/private-7c7dfe99-fix-nav-issues/4Hi2sd8mn4aAdMgN/images/integrations/data-ingestion/s3/GCS-bucket-2.png?fit=max&auto=format&n=4Hi2sd8mn4aAdMgN&q=85&s=ad5c9dbc29699e04539ea52e080f89c0" alt="Creating a GCS bucket in US East 4" border width="1437" height="386" data-path="images/integrations/data-ingestion/s3/GCS-bucket-2.png" />

  <h3 id="generate-an-access-key">
    Generate an access key
  </h3>

  <h3 id="create-a-service-account-hmac-key-and-secret">
    Create a service account HMAC key and secret
  </h3>

  Open **Cloud Storage > Settings > Interoperability** and either choose an existing **Access key**, or **CREATE A KEY FOR A SERVICE ACCOUNT**.  This guide covers the path for creating a new key for a new service account.

  <Image size="md" img="https://mintcdn.com/private-7c7dfe99-fix-nav-issues/4Hi2sd8mn4aAdMgN/images/integrations/data-ingestion/s3/GCS-create-a-service-account-key.png?fit=max&auto=format&n=4Hi2sd8mn4aAdMgN&q=85&s=f974c115e77c37dfd8babb3dc4021afb" alt="Generating a service account HMAC key in GCS" border width="969" height="911" data-path="images/integrations/data-ingestion/s3/GCS-create-a-service-account-key.png" />

  <h3 id="add-a-new-service-account">
    Add a new service account
  </h3>

  If this is a project with no existing service account, **CREATE NEW ACCOUNT**.

  <Image size="md" img="https://mintcdn.com/private-7c7dfe99-fix-nav-issues/4Hi2sd8mn4aAdMgN/images/integrations/data-ingestion/s3/GCS-create-service-account-0.png?fit=max&auto=format&n=4Hi2sd8mn4aAdMgN&q=85&s=6f669a997a89b502f110b5df2e3923fe" alt="Adding a new service account in GCS" border width="924" height="317" data-path="images/integrations/data-ingestion/s3/GCS-create-service-account-0.png" />

  There are three steps to creating the service account, in the first step give the account a meaningful name, ID, and description.

  <Image size="md" img="https://mintcdn.com/private-7c7dfe99-fix-nav-issues/4Hi2sd8mn4aAdMgN/images/integrations/data-ingestion/s3/GCS-create-service-account-a.png?fit=max&auto=format&n=4Hi2sd8mn4aAdMgN&q=85&s=5903f80f48c84d3c5e2c6cdcdf7271bd" alt="Defining a new service account name and ID in GCS" border width="842" height="737" data-path="images/integrations/data-ingestion/s3/GCS-create-service-account-a.png" />

  In the Interoperability settings dialog the IAM role **Storage Object Admin** role is recommended; select that role in step two.

  <Image size="md" img="https://mintcdn.com/private-7c7dfe99-fix-nav-issues/4Hi2sd8mn4aAdMgN/images/integrations/data-ingestion/s3/GCS-create-service-account-2.png?fit=max&auto=format&n=4Hi2sd8mn4aAdMgN&q=85&s=5fbcf2b01bf93f82b0b023c61dc24b1f" alt="Selecting IAM role Storage Object Admin in GCS" border width="822" height="396" data-path="images/integrations/data-ingestion/s3/GCS-create-service-account-2.png" />

  Step three is optional and not used in this guide.  You may allow users to have these privileges based on your policies.

  <Image size="md" img="https://mintcdn.com/private-7c7dfe99-fix-nav-issues/4Hi2sd8mn4aAdMgN/images/integrations/data-ingestion/s3/GCS-create-service-account-3.png?fit=max&auto=format&n=4Hi2sd8mn4aAdMgN&q=85&s=d0402ccd1193c0db63d9e76c66e6a4c9" alt="Configuring additional settings for the new service account in GCS" border width="635" height="697" data-path="images/integrations/data-ingestion/s3/GCS-create-service-account-3.png" />

  The service account HMAC key will be displayed.  Save this information, as it will be used in the ClickHouse configuration.

  <Image size="md" img="https://mintcdn.com/private-7c7dfe99-fix-nav-issues/4Hi2sd8mn4aAdMgN/images/integrations/data-ingestion/s3/GCS-guide-key.png?fit=max&auto=format&n=4Hi2sd8mn4aAdMgN&q=85&s=b257895924f1b6b2e7bca54cd0f7eae8" alt="Retrieving the generated HMAC key for GCS" border width="917" height="390" data-path="images/integrations/data-ingestion/s3/GCS-guide-key.png" />
</Accordion>

<h3 id="configure-clickhouse-keeper">
  Configure ClickHouse Keeper
</h3>

All of the ClickHouse Keeper nodes have the same configuration file except for the `server_id` line (first highlighted line below).  Modify the file with the hostnames for your ClickHouse Keeper servers, and on each of the servers set the `server_id` to match the appropriate `server` entry in the `raft_configuration`.  Since this example has `server_id` set to `3`, we have highlighted the matching lines in the `raft_configuration`.

* Edit the file with your hostnames, and make sure that they resolve from the ClickHouse server nodes and the Keeper nodes
* Copy the file into place (`/etc/clickhouse-keeper/keeper_config.xml` on each of the Keeper servers
* Edit the `server_id` on each machine, based on its entry number in the `raft_configuration`

```xml title=/etc/clickhouse-keeper/keeper_config.xml highlight={12,33-37} theme={null}
<clickhouse>
    <logger>
        <level>trace</level>
        <log>/var/log/clickhouse-keeper/clickhouse-keeper.log</log>
        <errorlog>/var/log/clickhouse-keeper/clickhouse-keeper.err.log</errorlog>
        <size>1000M</size>
        <count>3</count>
    </logger>
    <listen_host>0.0.0.0</listen_host>
    <keeper_server>
        <tcp_port>9181</tcp_port>
        <server_id>3</server_id>
        <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
        <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

        <coordination_settings>
            <operation_timeout_ms>10000</operation_timeout_ms>
            <session_timeout_ms>30000</session_timeout_ms>
            <raft_logs_level>warning</raft_logs_level>
        </coordination_settings>

        <raft_configuration>
            <server>
                <id>1</id>
                <hostname>keepernode1.us-east1-b.c.clickhousegcs-374921.internal</hostname>
                <port>9234</port>
            </server>
            <server>
                <id>2</id>
                <hostname>keepernode2.us-east4-c.c.clickhousegcs-374921.internal</hostname>
                <port>9234</port>
            </server>
            <server>
                <id>3</id>
                <hostname>keepernode3.us-east5-a.c.clickhousegcs-374921.internal</hostname>
                <port>9234</port>
            </server>
        </raft_configuration>
    </keeper_server>
</clickhouse>
```

<h3 id="configure-clickhouse-server">
  Configure ClickHouse server
</h3>

<Info>
  **best practice**

  Some of the steps in this guide will ask you to place a configuration file in `/etc/clickhouse-server/config.d/`.  This is the default location on Linux systems for configuration override files.  When you put these files into that directory ClickHouse will merge the content with the default configuration.  By placing these files in the `config.d` directory you will avoid losing your configuration during an upgrade.
</Info>

<h4 id="networking">
  Networking
</h4>

By default, ClickHouse listens on the loopback interface, in a replicated setup networking between machines is necessary.  Listen on all interfaces:

```xml title=/etc/clickhouse-server/config.d/network.xml theme={null}
<clickhouse>
    <listen_host>0.0.0.0</listen_host>
</clickhouse>
```

<h4 id="remote-clickhouse-keeper-servers">
  Remote ClickHouse Keeper servers
</h4>

Replication is coordinated by ClickHouse Keeper.  This configuration file identifies the ClickHouse Keeper nodes by hostname and port number.

* Edit the hostnames to match your Keeper hosts

```xml title=/etc/clickhouse-server/config.d/use-keeper.xml theme={null}
<clickhouse>
    <zookeeper>
        <node index="1">
            <host>keepernode1.us-east1-b.c.clickhousegcs-374921.internal</host>
            <port>9181</port>
        </node>
        <node index="2">
            <host>keepernode2.us-east4-c.c.clickhousegcs-374921.internal</host>
            <port>9181</port>
        </node>
        <node index="3">
            <host>keepernode3.us-east5-a.c.clickhousegcs-374921.internal</host>
            <port>9181</port>
        </node>
    </zookeeper>
</clickhouse>
```

<h4 id="remote-clickhouse-servers">
  Remote ClickHouse servers
</h4>

This file configures the hostname and port of each ClickHouse server in the cluster.  The default configuration file contains sample cluster definitions, in order to show only the clusters that are completely configured the tag `replace="true"` is added to the `remote_servers` entry so that when this configuration is merged with the default it replaces the `remote_servers` section instead of adding to it.

* Edit the file with your hostnames, and make sure that they resolve from the ClickHouse server nodes

```xml title=/etc/clickhouse-server/config.d/remote-servers.xml theme={null}
<clickhouse>
    <remote_servers replace="true">
        <cluster_1S_2R>
            <shard>
                <replica>
                    <host>chnode1.us-east1-b.c.clickhousegcs-374921.internal</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>chnode2.us-east4-c.c.clickhousegcs-374921.internal</host>
                    <port>9000</port>
                </replica>
            </shard>
        </cluster_1S_2R>
    </remote_servers>
</clickhouse>
```

<h4 id="replica-identification">
  Replica identification
</h4>

This file configures settings related to the ClickHouse Keeper path.  Specifically the macros used to identify which replica the data is part of.  On one server the replica should be specified as `replica_1`, and on the other server `replica_2`.  The names can be changed, based on our example of one replica being stored in South Carolina and the other in Northern Virginia the values could be `carolina` and `virginia`; just make sure that they're different on each machine.

```xml title=/etc/clickhouse-server/config.d/macros.xml highlight={8} theme={null}
<clickhouse>
    <distributed_ddl>
            <path>/clickhouse/task_queue/ddl</path>
    </distributed_ddl>
    <macros>
        <cluster>cluster_1S_2R</cluster>
        <shard>1</shard>
        <replica>replica_1</replica>
    </macros>
</clickhouse>
```

<h4 id="storage-in-gcs">
  Storage in GCS
</h4>

ClickHouse storage configuration includes `disks` and `policies`. The disk being configured below is named `gcs`, and is of `type` `s3`.  The type is s3 because ClickHouse accesses the GCS bucket as if it was an AWS S3 bucket.  Two copies of this configuration will be needed, one for each of the ClickHouse server nodes.

These substitutions should be made in the configuration below.

These substitutions differ between the two ClickHouse server nodes:

* `REPLICA 1 BUCKET` should be set to the name of the bucket in the same region as the server
* `REPLICA 1 FOLDER` should be changed to `replica_1` on one of the servers, and `replica_2` on the other

These substitutions are common across the two nodes:

* The `access_key_id` should be set to the HMAC Key generated earlier
* The `secret_access_key` should be set to HMAC Secret generated earlier

```xml title=/etc/clickhouse-server/config.d/storage.xml theme={null}
<clickhouse>
    <storage_configuration>
        <disks>
            <gcs>
                <support_batch_delete>true</support_batch_delete>
                <type>s3</type>
                <endpoint>https://storage.googleapis.com/REPLICA 1 BUCKET/REPLICA 1 FOLDER/</endpoint>
                <access_key_id>SERVICE ACCOUNT HMAC KEY</access_key_id>
                <secret_access_key>SERVICE ACCOUNT HMAC SECRET</secret_access_key>
                <metadata_path>/var/lib/clickhouse/disks/gcs/</metadata_path>
            </gcs>
            <cache>
                <type>cache</type>
                <disk>gcs</disk>
                <path>/var/lib/clickhouse/disks/gcs_cache/</path>
                <max_size>10Gi</max_size>
            </cache>
        </disks>
        <policies>
            <gcs_main>
                <volumes>
                    <main>
                        <disk>gcs</disk>
                    </main>
                </volumes>
            </gcs_main>
        </policies>
    </storage_configuration>
</clickhouse>
```

<h3 id="start-clickhouse-keeper">
  Start ClickHouse Keeper
</h3>

Use the commands for your operating system, for example:

```bash theme={null}
sudo systemctl enable clickhouse-keeper
sudo systemctl start clickhouse-keeper
sudo systemctl status clickhouse-keeper
```

<h4 id="check-clickhouse-keeper-status">
  Check ClickHouse Keeper status
</h4>

Send commands to the ClickHouse Keeper with `netcat`.  For example, `mntr` returns the state of the ClickHouse Keeper cluster.  If you run the command on each of the Keeper nodes you will see that one is a leader, and the other two are followers:

```bash theme={null}
echo mntr | nc localhost 9181
```

```response highlight={7-9,18-19} theme={null}
zk_version      v22.7.2.15-stable-f843089624e8dd3ff7927b8a125cf3a7a769c069
zk_avg_latency  0
zk_max_latency  11
zk_min_latency  0
zk_packets_received     1783
zk_packets_sent 1783
zk_num_alive_connections        2
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count  135
zk_watch_count  8
zk_ephemerals_count     3
zk_approximate_data_size        42533
zk_key_arena_size       28672
zk_latest_snapshot_size 0
zk_open_file_descriptor_count   182
zk_max_file_descriptor_count    18446744073709551615
zk_followers    2
zk_synced_followers     2
```

<h3 id="start-clickhouse-server">
  Start ClickHouse server
</h3>

On `chnode1` and `chnode` run:

```bash theme={null}
sudo service clickhouse-server start
```

```bash theme={null}
sudo service clickhouse-server status
```

<h3 id="verification">
  Verification
</h3>

<h4 id="verify-disk-configuration">
  Verify disk configuration
</h4>

`system.disks` should contain records for each disk:

* default
* gcs
* cache

```sql theme={null}
SELECT *
FROM system.disks
FORMAT Vertical
```

```response theme={null}
Row 1:
──────
name:             cache
path:             /var/lib/clickhouse/disks/gcs/
free_space:       18446744073709551615
total_space:      18446744073709551615
unreserved_space: 18446744073709551615
keep_free_space:  0
type:             s3
is_encrypted:     0
is_read_only:     0
is_write_once:    0
is_remote:        1
is_broken:        0
cache_path:       /var/lib/clickhouse/disks/gcs_cache/

Row 2:
──────
name:             default
path:             /var/lib/clickhouse/
free_space:       6555529216
total_space:      10331889664
unreserved_space: 6555529216
keep_free_space:  0
type:             local
is_encrypted:     0
is_read_only:     0
is_write_once:    0
is_remote:        0
is_broken:        0
cache_path:

Row 3:
──────
name:             gcs
path:             /var/lib/clickhouse/disks/gcs/
free_space:       18446744073709551615
total_space:      18446744073709551615
unreserved_space: 18446744073709551615
keep_free_space:  0
type:             s3
is_encrypted:     0
is_read_only:     0
is_write_once:    0
is_remote:        1
is_broken:        0
cache_path:

3 rows in set. Elapsed: 0.002 sec.
```

<h4 id="verify-that-tables-created-on-the-cluster-are-created-on-both-nodes">
  Verify that tables created on the cluster are created on both nodes
</h4>

```sql highlight={1,18} theme={null}
create table trips on cluster 'cluster_1S_2R' (
 `trip_id` UInt32,
 `pickup_date` Date,
 `pickup_datetime` DateTime,
 `dropoff_datetime` DateTime,
 `pickup_longitude` Float64,
 `pickup_latitude` Float64,
 `dropoff_longitude` Float64,
 `dropoff_latitude` Float64,
 `passenger_count` UInt8,
 `trip_distance` Float64,
 `tip_amount` Float32,
 `total_amount` Float32,
 `payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4))
ENGINE = ReplicatedMergeTree
PARTITION BY toYYYYMM(pickup_date)
ORDER BY pickup_datetime
SETTINGS storage_policy='gcs_main'
```

```response theme={null}
┌─host───────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ chnode2.us-east4-c.c.gcsqa-375100.internal │ 9000 │      0 │       │                   1 │                1 │
└────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host───────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ chnode1.us-east1-b.c.gcsqa-375100.internal │ 9000 │      0 │       │                   0 │                0 │
└────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

2 rows in set. Elapsed: 0.641 sec.
```

<h4 id="verify-that-data-can-be-inserted">
  Verify that data can be inserted
</h4>

```sql theme={null}
INSERT INTO trips SELECT
    trip_id,
    pickup_date,
    pickup_datetime,
    dropoff_datetime,
    pickup_longitude,
    pickup_latitude,
    dropoff_longitude,
    dropoff_latitude,
    passenger_count,
    trip_distance,
    tip_amount,
    total_amount,
    payment_type
FROM s3('https://ch-nyc-taxi.s3.eu-west-3.amazonaws.com/tsv/trips_{0..9}.tsv.gz', 'TabSeparatedWithNames')
LIMIT 1000000
```

<h4 id="verify-that-the-storage-policy-gcs_main-is-used-for-the-table">
  Verify that the storage policy `gcs_main` is used for the table.
</h4>

```sql theme={null}
SELECT
    engine,
    data_paths,
    metadata_path,
    storage_policy,
    formatReadableSize(total_bytes)
FROM system.tables
WHERE name = 'trips'
FORMAT Vertical
```

```response theme={null}
Row 1:
──────
engine:                          ReplicatedMergeTree
data_paths:                      ['/var/lib/clickhouse/disks/gcs/store/631/6315b109-d639-4214-a1e7-afbd98f39727/']
metadata_path:                   /var/lib/clickhouse/store/e0f/e0f3e248-7996-44d4-853e-0384e153b740/trips.sql
storage_policy:                  gcs_main
formatReadableSize(total_bytes): 36.42 MiB

1 row in set. Elapsed: 0.002 sec.
```

<h4 id="verify-in-google-cloud-console">
  Verify in Google Cloud console
</h4>

Looking at the buckets you will see that a folder was created in each bucket with the name that was used in the `storage.xml` configuration file.  Expand the folders and you will see many files, representing the data partitions.

<h4 id="bucket-for-replica-one">
  Bucket for replica one
</h4>

<Image img="https://mintcdn.com/private-7c7dfe99-fix-nav-issues/4Hi2sd8mn4aAdMgN/images/integrations/data-ingestion/s3/GCS-examine-bucket-1.png?fit=max&auto=format&n=4Hi2sd8mn4aAdMgN&q=85&s=8a4f86bba111c0b5a19268042e5199c9" size="lg" border alt="Replica one bucket in Google Cloud Storage showing folder structure with data partitions" width="958" height="736" data-path="images/integrations/data-ingestion/s3/GCS-examine-bucket-1.png" />

<h4 id="bucket-for-replica-two">
  Bucket for replica two
</h4>

<Image img="https://mintcdn.com/private-7c7dfe99-fix-nav-issues/4Hi2sd8mn4aAdMgN/images/integrations/data-ingestion/s3/GCS-examine-bucket-2.png?fit=max&auto=format&n=4Hi2sd8mn4aAdMgN&q=85&s=d2a02d5b80e8b798e6793d6980c6c288" size="lg" border alt="Replica two bucket in Google Cloud Storage showing folder structure with data partitions" width="958" height="736" data-path="images/integrations/data-ingestion/s3/GCS-examine-bucket-2.png" />
