> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-fix-nav-issues.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# DataStore 调试

> 借助 explain()、性能分析和日志功能调试 DataStore 操作

DataStore 提供了全面的调试工具，帮助您理解并优化数据管道。

<div id="overview">
  ## 调试工具概览
</div>

| 工具          | 用途     | 使用时机        |
| ----------- | ------ | ----------- |
| `explain()` | 查看执行计划 | 了解将要运行的 SQL |
| Profiler    | 分析性能   | 找出慢操作       |
| 日志          | 查看执行细节 | 排查异常行为      |

<div id="decision-matrix">
  ## 快速决策矩阵
</div>

| 需求        | 工具          | 命令                          |
| --------- | ----------- | --------------------------- |
| 查看执行计划    | `explain()` | `ds.explain()`              |
| 评估性能      | Profiler    | `config.enable_profiling()` |
| 调试 SQL 查询 | 日志          | `config.enable_debug()`     |
| 以上全部      | 组合使用        | 见下文                         |

<div id="quick-setup">
  ## 快速设置
</div>

<div id="enable-all">
  ### 启用所有调试功能
</div>

```python theme={null}
from chdb import datastore as pd
from chdb.datastore.config import config

# 启用所有调试功能
config.enable_debug()        # 详细日志
config.enable_profiling()    # 性能追踪

ds = pd.read_csv("data.csv")
result = ds.filter(ds['age'] > 25).groupby('city').agg({'salary': 'mean'})

# 查看执行计划
result.explain()

# 获取 Profiler 报告
from chdb.datastore.config import get_profiler
profiler = get_profiler()
profiler.report()
```

***

<div id="explain">
  ## explain() 方法
</div>

在运行查询前查看执行计划。

```python title="Query" theme={null}
ds = pd.read_csv("data.csv")

query = (ds
    .filter(ds['amount'] > 1000)
    .groupby('region')
    .agg({'amount': ['sum', 'mean']})
)

# 查看执行计划
query.explain()
```

```text title="Response" theme={null}
Pipeline:
  Source: file('data.csv', 'CSVWithNames')
  Filter: amount > 1000
  GroupBy: region
  Aggregate: sum(amount), avg(amount)

Generated SQL:
SELECT region, SUM(amount) AS sum, AVG(amount) AS mean
FROM file('data.csv', 'CSVWithNames')
WHERE amount > 1000
GROUP BY region
```

详见 [explain() 文档](/zh/products/chdb/debugging/explain)。

***

<div id="profiling">
  ## 性能分析
</div>

测量每个操作的执行时间。

```python title="Query" theme={null}
from chdb.datastore.config import config, get_profiler

# 启用性能分析
config.enable_profiling()

# 运行操作
ds = pd.read_csv("large_data.csv")
result = (ds
    .filter(ds['amount'] > 100)
    .groupby('category')
    .agg({'amount': 'sum'})
    .sort('sum', ascending=False)
    .head(10)
    .to_df()
)

# 查看报告
profiler = get_profiler()
profiler.report(min_duration_ms=0.1)
```

```text title="Response" theme={null}
性能报告
==================
步骤                          耗时        调用次数
----                          --------    -----
read_csv                      1.234s      1
filter                        0.002s      1
groupby                       0.001s      1
agg                           0.089s      1
sort                          0.045s      1
head                          0.001s      1
to_df (SQL execution)         0.567s      1
----                          --------    -----
总计                          1.939s      7
```

详情请参见[性能分析指南](/zh/products/chdb/debugging/profiling)。

***

<div id="logging">
  ## 日志
</div>

查看详细的执行日志。

```python theme={null}
from chdb.datastore.config import config

# 启用调试日志
config.enable_debug()

# 运行操作 - 日志将显示：
# - 生成的 SQL 查询
# - 使用的执行引擎
# - 缓存命中/未命中
# - 耗时信息
```

日志输出示例：

```text theme={null}
DEBUG - DataStore: Creating from file 'data.csv'
DEBUG - Query: SELECT region, SUM(amount) FROM ... WHERE amount > 1000 GROUP BY region
DEBUG - Engine: Using chdb for aggregation
DEBUG - Execution time: 0.089s
DEBUG - Cache: Storing result (key: abc123)
```

详见[日志配置](/zh/products/chdb/debugging/logging)。

***

<div id="scenarios">
  ## 常见调试场景
</div>

<div id="scenario-wrong-results">
  ### 1. 查询结果不符合预期
</div>

```python theme={null}
# 步骤 1：查看执行计划
query = ds.filter(ds['age'] > 25).groupby('city').sum()
query.explain(verbose=True)

# 步骤 2：启用日志以查看 SQL
config.enable_debug()

# 步骤 3：运行并检查日志
result = query.to_df()
```

<div id="scenario-slow">
  ### 2. 查询执行缓慢
</div>

```python theme={null}
# 步骤 1：启用 性能分析
config.enable_profiling()

# 步骤 2：执行查询
result = process_data()

# 步骤 3：查看 Profiler 报告
profiler = get_profiler()
profiler.report()

# 步骤 4：定位慢操作并进行优化
```

<div id="scenario-engine">
  ### 3. 了解引擎选择
</div>

```python theme={null}
# 启用详细日志
config.enable_debug()

# 运行操作
result = ds.filter(ds['x'] > 10).apply(custom_func)

# 日志将显示每个操作所使用的引擎：
# DEBUG - filter: Using chdb engine
# DEBUG - apply: Using pandas engine (custom function)
```

<div id="scenario-cache">
  ### 4. 调试缓存问题
</div>

```python theme={null}
# 启用调试以查看缓存操作
config.enable_debug()

# 第一次运行
result1 = ds.filter(ds['x'] > 10).to_df()
# 日志：缓存未命中，正在执行查询

# 第二次运行（应使用缓存）
result2 = ds.filter(ds['x'] > 10).to_df()
# 日志：缓存命中，返回已缓存的结果

# 如果未按预期缓存，请检查：
# - 操作是否完全相同？
# - 缓存是否已启用？config.cache_enabled
```

***

<div id="best-practices">
  ## 最佳实践
</div>

<div id="best-practice-1">
  ### 1. 在开发环境而非生产环境中调试
</div>

```python theme={null}
# 开发环境
config.enable_debug()
config.enable_profiling()

# 生产环境
config.set_log_level(logging.WARNING)
config.set_profiling_enabled(False)
```

<div id="best-practice-2">
  ### 2. 运行大型查询前先使用 explain()
</div>

```python theme={null}
# 构建查询
query = ds.filter(...).groupby(...).agg(...)

# 先检查执行计划
query.explain()

# 如果执行计划没问题，再执行
result = query.to_df()
```

<div id="best-practice-3">
  ### 3. 先进行性能分析，再优化
</div>

```python theme={null}
# 不要靠猜测判断性能瓶颈，要用测量说话
config.enable_profiling()
result = your_pipeline()
get_profiler().report()
```

<div id="best-practice-4">
  ### 4. 当结果有误时检查 SQL
</div>

```python theme={null}
# 查看生成的 SQL
print(query.to_sql())

# 与预期 SQL 进行比较
# 直接在 ClickHouse 中运行 SQL 以验证
```

***

<div id="summary">
  ## 调试工具汇总
</div>

| 工具          | 命令                          | 输出         |
| ----------- | --------------------------- | ---------- |
| 查看执行计划      | `ds.explain()`              | 执行步骤 + SQL |
| 详细执行计划      | `ds.explain(verbose=True)`  | + 元数据      |
| 查看 SQL      | `ds.to_sql()`               | SQL 查询字符串  |
| 启用调试        | `config.enable_debug()`     | 详细日志       |
| 启用性能分析      | `config.enable_profiling()` | 耗时数据       |
| Profiler 报告 | `get_profiler().report()`   | 性能摘要       |
| 重置 Profiler | `get_profiler().reset()`    | 清除耗时数据     |

***

<div id="next-steps">
  ## 后续步骤
</div>

* [explain() 方法](/zh/products/chdb/debugging/explain) - 执行计划详解文档
* [性能分析指南](/zh/products/chdb/debugging/profiling) - 性能分析
* [日志配置](/zh/products/chdb/debugging/logging) - 日志级别和输出格式设置