> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-fix-nav-issues.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

> 允许从指定集群中的多个节点并行处理 HDFS 中的文件。

# hdfsCluster

允许从指定集群中的多个节点并行处理 HDFS 中的文件。在发起节点上，它会与集群中的所有节点建立连接，展开 HDFS 文件路径中的星号，并动态分发各个文件。在工作节点上，它会向发起节点请求下一个要处理的任务并进行处理。该过程会重复进行，直到所有任务都完成。

<div id="syntax">
  ## 语法
</div>

```sql theme={null}
hdfsCluster(cluster_name, URI, format, structure)
```

<div id="arguments">
  ## 参数
</div>

| 参数             | 描述                                                                                                                                                                                              |
| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `cluster_name` | 集群名称，用于构建远程和本地服务器的一组地址及连接参数。                                                                                                                                                                    |
| `URI`          | 指向单个文件或一组文件的 URI。在只读模式下支持以下通配符：`*`、`**`、`?`、`{'abc','def'}` 和 `{N..M}`，其中 `N`、`M` 为数字，`abc`、`def` 为字符串。更多信息请参见[路径中的通配符](/zh/reference/engines/table-engines/integrations/s3#wildcards-in-path)。 |
| `format`       | 文件的[格式](/zh/reference/formats)。                                                                                                                                                                 |
| `structure`    | 表的结构。格式为 `'column1_name column1_type, column2_name column2_type, ...'`。                                                                                                                         |

<div id="returned_value">
  ## 返回值
</div>

一个具有指定结构、用于读取指定文件中数据的表。

<div id="examples">
  ## 示例
</div>

1. 假设我们有一个名为 `cluster_simple` 的 ClickHouse 集群，以及 HDFS 上 URI 如下的几个文件：

* 'hdfs\://hdfs1:9000/some\_dir/some\_file\_1'
* 'hdfs\://hdfs1:9000/some\_dir/some\_file\_2'
* 'hdfs\://hdfs1:9000/some\_dir/some\_file\_3'
* 'hdfs\://hdfs1:9000/another\_dir/some\_file\_1'
* 'hdfs\://hdfs1:9000/another\_dir/some\_file\_2'
* 'hdfs\://hdfs1:9000/another\_dir/some\_file\_3'

2. 查询这些文件中的行数：

```sql theme={null}
SELECT count(*)
FROM hdfsCluster('cluster_simple', 'hdfs://hdfs1:9000/{some,another}_dir/some_file_{1..3}', 'TSV', 'name String, value UInt32')
```

3. 查询这两个目录下所有文件的行数：

```sql theme={null}
SELECT count(*)
FROM hdfsCluster('cluster_simple', 'hdfs://hdfs1:9000/{some,another}_dir/*', 'TSV', 'name String, value UInt32')
```

<Note>
  如果文件列表中包含带前导零的数字范围，请为每一位数字分别使用花括号写法，或使用 `?`。
</Note>

<div id="related">
  ## 相关
</div>

* [HDFS 引擎](/zh/reference/engines/table-engines/integrations/hdfs)
* [HDFS 表函数](/zh/reference/functions/table-functions/hdfs)