> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-fix-nav-issues.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# DataStore factory methods

> Create DataStore instances from files, databases, cloud storage, and data lakes

DataStore provides over 20 factory methods to create instances from various data sources including local files, databases, cloud storage, and data lakes.

<h2 id="uri">
  Universal URI Interface
</h2>

The `uri()` method is the recommended universal entry point that auto-detects the source type:

```python theme={null}
from chdb.datastore import DataStore

# Local files
ds = DataStore.uri("data.csv")
ds = DataStore.uri("/path/to/data.parquet")

# Cloud storage
ds = DataStore.uri("s3://bucket/data.parquet?nosign=true")
ds = DataStore.uri("https://example.com/data.csv")

# Databases
ds = DataStore.uri("mysql://user:pass@host:3306/db/table")
ds = DataStore.uri("postgresql://user:pass@host:5432/db/table")
```

<h3 id="uri-syntax">
  URI Syntax Reference
</h3>

| Source Type | URI Format                                  | Example                                                |
| ----------- | ------------------------------------------- | ------------------------------------------------------ |
| Local file  | `path/to/file`                              | `data.csv`, `/abs/path/data.parquet`                   |
| S3          | `s3://bucket/path`                          | `s3://mybucket/data.parquet?nosign=true`               |
| GCS         | `gs://bucket/path`                          | `gs://mybucket/data.csv`                               |
| Azure       | `az://container/path`                       | `az://mycontainer/data.parquet`                        |
| HTTP/HTTPS  | `https://url`                               | `https://example.com/data.csv`                         |
| MySQL       | `mysql://user:pass@host:port/db/table`      | `mysql://root:pass@localhost:3306/mydb/users`          |
| PostgreSQL  | `postgresql://user:pass@host:port/db/table` | `postgresql://postgres:pass@localhost:5432/mydb/users` |
| SQLite      | `sqlite:///path?table=name`                 | `sqlite:///data.db?table=users`                        |
| ClickHouse  | `clickhouse://host:port/db/table`           | `clickhouse://localhost:9000/default/hits`             |

***

<h2 id="file-sources">
  File Sources
</h2>

<h3 id="from-file">
  `from_file`
</h3>

Create DataStore from a local or remote file with automatic format detection.

```python theme={null}
DataStore.from_file(path, format=None, compression=None, **kwargs)
```

**Parameters:**

| Parameter     | Type | Default    | Description                              |
| ------------- | ---- | ---------- | ---------------------------------------- |
| `path`        | str  | *required* | File path (local or URL)                 |
| `format`      | str  | `None`     | File format (auto-detected if None)      |
| `compression` | str  | `None`     | Compression type (auto-detected if None) |

**Supported formats:** CSV, TSV, Parquet, JSON, JSONLines, ORC, Avro, Arrow

**Examples:**

```python theme={null}
from chdb.datastore import DataStore

# Auto-detect format from extension
ds = DataStore.from_file("data.csv")
ds = DataStore.from_file("data.parquet")
ds = DataStore.from_file("data.json")

# Explicit format
ds = DataStore.from_file("data.txt", format="CSV")

# With compression
ds = DataStore.from_file("data.csv.gz", compression="gzip")
```

<h3 id="pandas-read">
  Pandas-compatible read functions
</h3>

```python theme={null}
from chdb import datastore as pd

# CSV files
ds = pd.read_csv("data.csv")
ds = pd.read_csv("data.csv", sep=";", header=0, nrows=1000)

# Parquet files (recommended for large datasets)
ds = pd.read_parquet("data.parquet")
ds = pd.read_parquet("data.parquet", columns=['col1', 'col2'])

# JSON files
ds = pd.read_json("data.json")
ds = pd.read_json("data.jsonl", lines=True)

# Excel files
ds = pd.read_excel("data.xlsx", sheet_name="Sheet1")
```

***

<h2 id="cloud-storage">
  Cloud Storage
</h2>

<h3 id="from-s3">
  `from_s3`
</h3>

Create DataStore from Amazon S3.

```python theme={null}
DataStore.from_s3(url, access_key_id=None, secret_access_key=None, format=None, **kwargs)
```

**Parameters:**

| Parameter           | Type | Default    | Description                 |
| ------------------- | ---- | ---------- | --------------------------- |
| `url`               | str  | *required* | S3 URL (s3://bucket/path)   |
| `access_key_id`     | str  | `None`     | AWS access key ID           |
| `secret_access_key` | str  | `None`     | AWS secret access key       |
| `format`            | str  | `None`     | File format (auto-detected) |

**Examples:**

```python theme={null}
from chdb.datastore import DataStore

# Anonymous access (public bucket)
ds = DataStore.from_s3("s3://bucket/data.parquet")

# With credentials
ds = DataStore.from_s3(
    "s3://bucket/data.parquet",
    access_key_id="AKIAIOSFODNN7EXAMPLE",
    secret_access_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
)

# Using URI with query parameters
ds = DataStore.uri("s3://bucket/data.parquet?nosign=true")
ds = DataStore.uri("s3://bucket/data.parquet?access_key_id=KEY&secret_access_key=SECRET")
```

<h3 id="from-gcs">
  `from_gcs`
</h3>

Create DataStore from Google Cloud Storage.

```python theme={null}
DataStore.from_gcs(url, credentials_path=None, **kwargs)
```

**Examples:**

```python theme={null}
ds = DataStore.from_gcs("gs://bucket/data.parquet")
ds = DataStore.from_gcs("gs://bucket/data.parquet", credentials_path="/path/to/creds.json")
```

<h3 id="from-azure">
  `from_azure`
</h3>

Create DataStore from Azure Blob Storage.

```python theme={null}
DataStore.from_azure(url, account_name=None, account_key=None, **kwargs)
```

**Examples:**

```python theme={null}
ds = DataStore.from_azure(
    "az://container/data.parquet",
    account_name="myaccount",
    account_key="mykey"
)
```

<h3 id="from-hdfs">
  `from_hdfs`
</h3>

Create DataStore from HDFS.

```python theme={null}
DataStore.from_hdfs(url, **kwargs)
```

**Examples:**

```python theme={null}
ds = DataStore.from_hdfs("hdfs://namenode:8020/path/data.parquet")
```

<h3 id="from-url">
  `from_url`
</h3>

Create DataStore from HTTP/HTTPS URL.

```python theme={null}
DataStore.from_url(url, format=None, **kwargs)
```

**Examples:**

```python theme={null}
ds = DataStore.from_url("https://example.com/data.csv")
ds = DataStore.from_url("https://raw.githubusercontent.com/user/repo/main/data.parquet")
```

***

<h2 id="databases">
  Databases
</h2>

<h3 id="from-mysql">
  `from_mysql`
</h3>

Create DataStore from MySQL database.

```python theme={null}
DataStore.from_mysql(host, database, table, user, password, port=3306, **kwargs)
```

**Parameters:**

| Parameter  | Type | Default    | Description   |
| ---------- | ---- | ---------- | ------------- |
| `host`     | str  | *required* | MySQL host    |
| `database` | str  | *required* | Database name |
| `table`    | str  | *required* | Table name    |
| `user`     | str  | *required* | Username      |
| `password` | str  | *required* | Password      |
| `port`     | int  | `3306`     | Port number   |

**Examples:**

```python theme={null}
ds = DataStore.from_mysql(
    host="localhost",
    database="mydb",
    table="users",
    user="root",
    password="password"
)

# Using URI
ds = DataStore.uri("mysql://root:password@localhost:3306/mydb/users")
```

<h3 id="from-postgresql">
  `from_postgresql`
</h3>

Create DataStore from PostgreSQL database.

```python theme={null}
DataStore.from_postgresql(host, database, table, user, password, port=5432, **kwargs)
```

**Examples:**

```python theme={null}
ds = DataStore.from_postgresql(
    host="localhost",
    database="mydb",
    table="users",
    user="postgres",
    password="password"
)

# Using URI
ds = DataStore.uri("postgresql://postgres:password@localhost:5432/mydb/users")
```

<h3 id="from-clickhouse">
  `from_clickhouse`
</h3>

Create DataStore from ClickHouse server.

```python theme={null}
DataStore.from_clickhouse(host, database, table, user=None, password=None, port=9000, **kwargs)
```

**Examples:**

```python theme={null}
ds = DataStore.from_clickhouse(
    host="localhost",
    database="default",
    table="hits",
    user="default",
    password=""
)

# Connection-level mode (explore databases)
ds = DataStore.from_clickhouse(
    host="analytics.company.com",
    user="analyst",
    password="secret"
)
ds.databases()                  # List databases
ds.tables("production")         # List tables
result = ds.sql("SELECT * FROM production.users LIMIT 10")
```

<h3 id="from-mongodb">
  `from_mongodb`
</h3>

Create DataStore from MongoDB.

```python theme={null}
DataStore.from_mongodb(uri, database, collection, **kwargs)
```

**Examples:**

```python theme={null}
ds = DataStore.from_mongodb(
    uri="mongodb://localhost:27017",
    database="mydb",
    collection="users"
)
```

<h3 id="from-sqlite">
  `from_sqlite`
</h3>

Create DataStore from SQLite database.

```python theme={null}
DataStore.from_sqlite(database_path, table, **kwargs)
```

**Examples:**

```python theme={null}
ds = DataStore.from_sqlite("data.db", table="users")

# Using URI
ds = DataStore.uri("sqlite:///data.db?table=users")
```

***

<h2 id="data-lakes">
  Data Lakes
</h2>

<h3 id="from-iceberg">
  `from_iceberg`
</h3>

Create DataStore from Apache Iceberg table.

```python theme={null}
DataStore.from_iceberg(path, **kwargs)
```

**Examples:**

```python theme={null}
ds = DataStore.from_iceberg("/path/to/iceberg_table")
ds = DataStore.uri("iceberg://catalog/namespace/table")
```

<h3 id="from-delta">
  `from_delta`
</h3>

Create DataStore from Delta Lake table.

```python theme={null}
DataStore.from_delta(path, **kwargs)
```

**Examples:**

```python theme={null}
ds = DataStore.from_delta("/path/to/delta_table")
ds = DataStore.uri("deltalake:///path/to/delta_table")
```

<h3 id="from-hudi">
  `from_hudi`
</h3>

Create DataStore from Apache Hudi table.

```python theme={null}
DataStore.from_hudi(path, **kwargs)
```

**Examples:**

```python theme={null}
ds = DataStore.from_hudi("/path/to/hudi_table")
ds = DataStore.uri("hudi:///path/to/hudi_table")
```

***

<h2 id="in-memory">
  In-Memory Sources
</h2>

<h3 id="from-df">
  `from_df` / `from_dataframe`
</h3>

Create DataStore from pandas DataFrame.

```python theme={null}
DataStore.from_df(df, name=None)
DataStore.from_dataframe(df, name=None)  # alias
```

**Examples:**

```python theme={null}
import pandas
from chdb.datastore import DataStore

pdf = pandas.DataFrame({'a': [1, 2, 3], 'b': ['x', 'y', 'z']})
ds = DataStore.from_df(pdf)
```

<h3 id="dataframe-constructor">
  `DataFrame` Constructor
</h3>

Create DataStore using pandas-like constructor.

```python theme={null}
from chdb import datastore as pd

# From dictionary
ds = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'age': [25, 30]
})

# From pandas DataFrame
import pandas
pdf = pandas.DataFrame({'a': [1, 2, 3]})
ds = pd.DataFrame(pdf)
```

***

<h2 id="special-sources">
  Special Sources
</h2>

<h3 id="from-numbers">
  `from_numbers`
</h3>

Create DataStore with sequential numbers (useful for testing).

```python theme={null}
DataStore.from_numbers(count, **kwargs)
```

**Examples:**

```python theme={null}
ds = DataStore.from_numbers(1000000)  # 1M rows with 'number' column
result = ds.filter(ds['number'] % 2 == 0).head(10)  # Even numbers
```

<h3 id="from-random">
  `from_random`
</h3>

Create DataStore with random data.

```python theme={null}
DataStore.from_random(rows, columns, **kwargs)
```

**Examples:**

```python theme={null}
ds = DataStore.from_random(rows=1000, columns=5)
```

<h3 id="run-sql">
  `run_sql`
</h3>

Create DataStore from raw SQL query.

```python theme={null}
DataStore.run_sql(query)
```

**Examples:**

```python theme={null}
ds = DataStore.run_sql("""
    SELECT number, number * 2 as doubled
    FROM numbers(100)
    WHERE number % 10 = 0
""")
```

***

<h2 id="summary">
  Summary Table
</h2>

| Method              | Source Type          | Example                                                  |
| ------------------- | -------------------- | -------------------------------------------------------- |
| `uri()`             | Universal            | `DataStore.uri("s3://bucket/data.parquet")`              |
| `from_file()`       | Local/Remote files   | `DataStore.from_file("data.csv")`                        |
| `read_csv()`        | CSV files            | `pd.read_csv("data.csv")`                                |
| `read_parquet()`    | Parquet files        | `pd.read_parquet("data.parquet")`                        |
| `from_s3()`         | Amazon S3            | `DataStore.from_s3("s3://bucket/path")`                  |
| `from_gcs()`        | Google Cloud Storage | `DataStore.from_gcs("gs://bucket/path")`                 |
| `from_azure()`      | Azure Blob           | `DataStore.from_azure("az://container/path")`            |
| `from_hdfs()`       | HDFS                 | `DataStore.from_hdfs("hdfs://host/path")`                |
| `from_url()`        | HTTP/HTTPS           | `DataStore.from_url("https://example.com/data.csv")`     |
| `from_mysql()`      | MySQL                | `DataStore.from_mysql(host, db, table, user, pass)`      |
| `from_postgresql()` | PostgreSQL           | `DataStore.from_postgresql(host, db, table, user, pass)` |
| `from_clickhouse()` | ClickHouse           | `DataStore.from_clickhouse(host, db, table)`             |
| `from_mongodb()`    | MongoDB              | `DataStore.from_mongodb(uri, db, collection)`            |
| `from_sqlite()`     | SQLite               | `DataStore.from_sqlite("data.db", table)`                |
| `from_iceberg()`    | Apache Iceberg       | `DataStore.from_iceberg("/path/to/table")`               |
| `from_delta()`      | Delta Lake           | `DataStore.from_delta("/path/to/table")`                 |
| `from_hudi()`       | Apache Hudi          | `DataStore.from_hudi("/path/to/table")`                  |
| `from_df()`         | pandas DataFrame     | `DataStore.from_df(pandas_df)`                           |
| `DataFrame()`       | Dictionary/DataFrame | `pd.DataFrame({'a': [1, 2, 3]})`                         |
| `from_numbers()`    | Sequential numbers   | `DataStore.from_numbers(1000000)`                        |
| `from_random()`     | Random data          | `DataStore.from_random(rows=1000, columns=5)`            |
| `run_sql()`         | Raw SQL              | `DataStore.run_sql("SELECT * FROM ...")`                 |
