> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-fix-nav-issues.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# método explain()

> Consulta planes de ejecución de DataStore con el método explain()

El método `explain()` muestra el plan de ejecución de una consulta de DataStore, lo que te ayuda a entender qué operaciones se realizarán y qué SQL se generará.

<div id="basic">
  ## Uso básico
</div>

```python theme={null}
from pathlib import Path
Path("sales.csv").write_text("""\
region,product,category,amount,quantity,price,date,order_id
East,Widget,Electronics,5200,10,120,2024-01-15,1001
West,Gadget,Electronics,800,5,160,2024-02-20,1002
East,Gizmo,Home,6500,3,100,2024-03-10,1003
North,Widget,Electronics,4500,6,150,2024-06-18,1004
West,Gadget,Electronics,2000,8,250,2024-09-14,1005
""")

from chdb import datastore as pd

ds = pd.read_csv("sales.csv")

query = (ds
    .filter(ds['amount'] > 1000)
    .groupby('region')
    .agg({'amount': ['sum', 'mean']})
    .sort('sum', ascending=False)
)

# Ver el plan de ejecución
query.explain()
```

<div id="syntax">
  ## Sintaxis
</div>

```python theme={null}
explain(verbose=False) -> None
```

**Parámetros:**

| Parámetro | Tipo | Predeterminado | Descripción                   |
| --------- | ---- | -------------- | ----------------------------- |
| `verbose` | bool | `False`        | Muestra metadatos adicionales |

<div id="output-format">
  ## Formato de salida
</div>

<div id="standard">
  ### Salida estándar
</div>

```text theme={null}
================================================================================
plan de ejecución (in execution order)
================================================================================

 [1] 📊 Data Source: file('sales.csv', 'csv')

Operations:
────────────────────────────────────────────────────────────────────────────────
    ️  Segment 1 [chDB] (from source): Operations 2-5
    ️  Note: SQL operations after Pandas ops use Python() table function

 [2] 🚀 [chDB] WHERE: "amount" > 1000
 [3] 🚀 [chDB] GROUP BY: region
 [4] 🚀 [chDB] AGGREGATE: sum(amount), avg(amount)
 [5] 🚀 [chDB] ORDER BY: sum DESC

────────────────────────────────────────────────────────────────────────────────
Final State: 📊 Pending (lazy, not yet executed)
             └─> Will execute when print(), .to_df(), .execute() is called

────────────────────────────────────────────────────────────────────────────────
Generated SQL Query:
────────────────────────────────────────────────────────────────────────────────

SELECT region, SUM(amount) AS sum, AVG(amount) AS mean
FROM file('sales.csv', 'csv')
WHERE "amount" > 1000
GROUP BY region
ORDER BY sum DESC

================================================================================
```

<div id="icons">
  ### Leyenda de los iconos
</div>

| Icono | Significado             |
| ----- | ----------------------- |
| 📊    | Fuente de datos         |
| 🚀    | Operación de chDB (SQL) |
| 🐼    | Operación de pandas     |

<div id="verbose">
  ### Salida detallada
</div>

```python theme={null}
query.explain(verbose=True)
```

El modo detallado muestra detalles adicionales de cada operación, incluida la consulta SQL completa con mecanismos internos de seguimiento del orden de las filas.

***

<div id="phases">
  ## Tres fases de ejecución
</div>

La salida de EXPLAIN muestra las operaciones en tres fases:

<div id="phase-1">
  ### Fase 1: Construcción de consultas SQL (diferida)
</div>

Operaciones que se compilan a SQL:

```text theme={null}
  1. Source: file('sales.csv', 'CSVWithNames')
  2. Filter: amount > 1000      
  3. GroupBy: region
  4. Aggregate: sum(amount)
```

<div id="phase-2">
  ### Fase 2: Momento de ejecución
</div>

Cuando se produce un trigger:

```text theme={null}
  5. Execute SQL -> DataFrame
     Trigger: to_df() called
```

<div id="phase-3">
  ### Fase 3: Operaciones de DataFrame
</div>

Operaciones posteriores a la ejecución:

```text theme={null}
  6. [pandas] pivot_table(...)
  7. [pandas] apply(custom_func)
```

***

<div id="understanding">
  ## Comprender el plan
</div>

<div id="source">
  ### Información de la fuente
</div>

```text theme={null}
Source: file('sales.csv', 'CSVWithNames')
```

* `file()` - función de tabla `file()` de ClickHouse
* `'CSVWithNames'` - formato de archivo con fila de encabezado

Otros tipos de origen:

```text theme={null}
Source: s3('bucket/data.parquet', ...)
Source: mysql('host', 'db', 'table', ...)
Source: __dataframe__  (pandas DataFrame input)
```

<div id="filter">
  ### Operaciones de filtrado
</div>

```text theme={null}
Filter: amount > 1000 AND status = 'active'
```

Muestra la cláusula WHERE que se aplicará.

<div id="groupby">
  ### GroupBy y Aggregate
</div>

```text theme={null}
GroupBy: region, category
Aggregate: sum(amount), avg(amount), count(id)
```

Muestra las columnas de GROUP BY y las funciones de agregación.

<div id="sort">
  ### Operaciones de ordenación
</div>

```text theme={null}
Sort: sum DESC, region ASC
```

Muestra la cláusula ORDER BY.

<div id="limit">
  ### Limitar operaciones
</div>

```text theme={null}
Limit: 10
Offset: 100
```

Muestra LIMIT y OFFSET.

***

<div id="engine">
  ## Información del motor
</div>

Al usar el modo detallado, puede ver qué motor se utilizará:

```text theme={null}
Filter: amount > 1000
  - Engine: chdb
  - Pushdown: Yes

Apply: custom_function
  - Engine: pandas
  - Pushdown: No
```

<div id="pushdown">
  ### Pushdown
</div>

* **Sí**: La operación se ejecutará en la fuente de datos (SQL)
* **No**: La operación requiere ejecutarse con pandas

***

<div id="examples">
  ## Ejemplos
</div>

<div id="example-simple">
  ### Consulta simple
</div>

```python theme={null}
from pathlib import Path
Path("data.csv").write_text("""\
name,age,city,salary,department
Alice,25,NYC,55000,Engineering
Bob,30,LA,65000,Product
Charlie,35,NYC,80000,Engineering
Diana,28,SF,70000,Design
Eve,42,NYC,95000,Product
""")

ds = pd.read_csv("data.csv")
ds.filter(ds['age'] > 25).explain()
```

```text theme={null}
================================================================================
Plan de ejecución (en orden de ejecución)
================================================================================

 [1] 📊 Data Source: file('data.csv', 'csv')

Operaciones:
────────────────────────────────────────────────────────────────────────────────
    ️  Segment 1 [chDB] (desde la fuente): Operaciones 2-2

 [2] 🚀 [chDB] WHERE: "age" > 25

────────────────────────────────────────────────────────────────────────────────
Consulta SQL generada:
────────────────────────────────────────────────────────────────────────────────

SELECT * FROM file('data.csv', 'csv') WHERE "age" > 25

================================================================================
```

<div id="example-complex">
  ### Agregación compleja
</div>

```python theme={null}
query = (ds
    .filter(ds['date'] >= '2024-01-01')
    .filter(ds['amount'] > 100)
    .select('region', 'category', 'amount')
    .groupby('region', 'category')
    .agg({
        'amount': ['sum', 'mean', 'count']
    })
    .sort('sum', ascending=False)
    .limit(20)
)
query.explain()
```

```text theme={null}
================================================================================
plan de ejecución (in execution order)
================================================================================

 [1] 📊 Data Source: file('sales.csv', 'csv')

Operations:
────────────────────────────────────────────────────────────────────────────────
    ️  Segment 1 [chDB] (from source): Operations 2-8

 [2] 🚀 [chDB] WHERE: "date" >= '2024-01-01'
 [3] 🚀 [chDB] WHERE: "amount" > 100
 [4] 🚀 [chDB] SELECT: region, category, amount
 [5] 🚀 [chDB] GROUP BY: region, category
 [6] 🚀 [chDB] AGGREGATE: sum(amount), avg(amount), count(amount)
 [7] 🚀 [chDB] ORDER BY: sum DESC
 [8] 🚀 [chDB] LIMIT: 20

────────────────────────────────────────────────────────────────────────────────
Generated SQL Query:
────────────────────────────────────────────────────────────────────────────────

SELECT region, category, 
       SUM(amount) AS sum, 
       AVG(amount) AS mean, 
       COUNT(amount) AS count
FROM file('sales.csv', 'csv')
WHERE "date" >= '2024-01-01' AND "amount" > 100
GROUP BY region, category
ORDER BY sum DESC
LIMIT 20

================================================================================
```

<div id="example-mixed">
  ### SQL y pandas combinados
</div>

Cuando las operaciones no pueden llevarse por completo a SQL, el plan muestra varios segmentos:

```python theme={null}
query = (ds
    .filter(ds['age'] > 25)           # SQL
    .groupby('city')                   # SQL
    .agg({'salary': 'mean'})           # SQL
    .apply(lambda x: x * 1.1)          # pandas (activa la división del segmento)
    .filter(ds['mean'] > 50000)        # SQL (nuevo segmento)
)
query.explain()
```

```text theme={null}
================================================================================
plan de ejecución (in execution order)
================================================================================

 [1] 📊 Data Source: file('data.csv', 'csv')

Operations:
────────────────────────────────────────────────────────────────────────────────
    ️  Segment 1 [chDB] (from source): Operations 2-4
    ️  Segment 2 [Pandas] (on DataFrame): Operation 5
    ️  Segment 3 [chDB] (on DataFrame): Operation 6
    ️  Note: SQL operations after Pandas ops use Python() table function

 [2] 🚀 [chDB] WHERE: "age" > 25
 [3] 🚀 [chDB] GROUP BY: city
 [4] 🚀 [chDB] AGGREGATE: avg(salary)
 [5] 🐼 [Pandas] APPLY: lambda
 [6] 🚀 [chDB] WHERE: "mean" > 50000

================================================================================
```

***

<div id="debugging">
  ## Depuración con explain()
</div>

<div id="debug-filter">
  ### Verificar la lógica de filtrado
</div>

```python theme={null}
# Verifica que tu filtro sea correcto
query = ds.filter((ds['age'] > 25) & (ds['city'] == 'NYC'))
query.explain()
# La salida muestra: Filter: age > 25 AND city = 'NYC'
```

<div id="debug-select">
  ### Verificar la selección de columnas
</div>

```python theme={null}
# Comprobar la poda de columnas
query = ds.select('name', 'age').filter(ds['age'] > 25)
query.explain()
# La salida muestra: SELECT name, age FROM ... WHERE age > 25
```

<div id="debug-agg">
  ### Comprender la agregación
</div>

```python theme={null}
# Verificar funciones de agregación
query = ds.groupby('dept').agg({'salary': ['sum', 'mean', 'std']})
query.explain()
# La salida muestra: SELECT dept, SUM(salary), AVG(salary), stddevPop(salary)
```

***

<div id="best-practices">
  ## Buenas prácticas
</div>

<div id="best-practice-1">
  ### 1. Verifique antes de ejecutar consultas grandes
</div>

```python theme={null}
# Siempre explicar primero para datos de gran volumen
query = ds.complex_pipeline()
query.explain()  # Verificar el plan

# Si el plan es correcto
result = query.to_df()  # Ejecutar
```

<div id="best-practice-2">
  ### 2. Usa detallado para depurar
</div>

```python theme={null}
# Cuando algo parece estar mal
query.explain(verbose=True)
# Muestra la selección del motor e información de pushdown
```

<div id="best-practice-3">
  ### 3. Comparar con to\_sql()
</div>

```python theme={null}
# explain() muestra el plan
query.explain()

# to_sql() muestra solo el SQL
print(query.to_sql())

# Ambos útiles para distintos propósitos
```

<div id="best-practice-4">
  ### 4. Comprobar el estado del pushdown
</div>

```python theme={null}
# El modo detallado muestra si las operaciones se aplican con pushdown
query.explain(verbose=True)

# Si Pushdown: No, la operación se ejecuta en pandas
# Considere reestructurar la consulta para mejorar el rendimiento
```
