> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-fix-nav-issues.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

> Page detailing allocation profiling in ClickHouse

# Allocation profiling

ClickHouse uses [jemalloc](https://github.com/jemalloc/jemalloc) as its global allocator. Jemalloc comes with tools for allocation sampling and profiling.

ClickHouse and Keeper allow you to control sampling using configs, query settings, `SYSTEM` commands and four letter word (4LW) commands in Keeper. There are several ways to inspect the results:

* Collect samples into `system.trace_log` under the `JemallocSample` type for per-query analysis.
* View live memory statistics and fetch heap profiles through the built-in [jemalloc web UI](#jemalloc-web-ui) (26.2+).
* Query the current heap profile directly from SQL using [`system.jemalloc_profile_text`](#fetching-heap-profiles-from-sql) (26.2+).
* Flush heap profiles to disk and analyze them with [`jeprof`](#analyzing-heap-profile-files-with-jeprof).

<Note>
  This guide is applicable for versions 25.9+.
  For older versions, please check [allocation profiling for versions before 25.9](/concepts/features/performance/allocation-profiling-old).
</Note>

<h2 id="sampling-allocations">
  Sampling allocations
</h2>

To sample and profile allocations, start ClickHouse/Keeper with the `jemalloc_enable_global_profiler` config enabled:

```xml theme={null}
<clickhouse>
    <jemalloc_enable_global_profiler>1</jemalloc_enable_global_profiler>
</clickhouse>
```

`jemalloc` will sample allocations and store the information internally.

You can also enable sampling per query using the `jemalloc_enable_profiler` setting.

<Warning>
  **Warning**

  Because ClickHouse is an allocation-heavy application, jemalloc sampling may incur performance overhead.
</Warning>

<h2 id="storing-jemalloc-samples-in-system-trace-log">
  Storing jemalloc samples in `system.trace_log`
</h2>

You can store jemalloc samples in `system.trace_log` under the `JemallocSample` type.
To enable it globally, use the `jemalloc_collect_global_profile_samples_in_trace_log` config:

```xml theme={null}
<clickhouse>
    <jemalloc_collect_global_profile_samples_in_trace_log>1</jemalloc_collect_global_profile_samples_in_trace_log>
</clickhouse>
```

<Warning>
  **Warning**

  Because ClickHouse is an allocation-heavy application, collecting all samples in system.trace\_log may incur high load.
</Warning>

You can also enable it per query using the `jemalloc_collect_profile_samples_in_trace_log` setting.

<h3 id="example-analyzing-memory-usage-trace-log">
  Example: analyzing memory usage of a query
</h3>

First, run a query with the jemalloc profiler enabled and collect the samples into `system.trace_log`:

```sql theme={null}
SELECT *
FROM numbers(1000000)
ORDER BY number DESC
SETTINGS max_bytes_ratio_before_external_sort = 0
FORMAT `Null`
SETTINGS jemalloc_enable_profiler = 1, jemalloc_collect_profile_samples_in_trace_log = 1

Query id: 8678d8fe-62c5-48b8-b0cd-26851c62dd75

Ok.

0 rows in set. Elapsed: 0.009 sec. Processed 1.00 million rows, 8.00 MB (108.58 million rows/s., 868.61 MB/s.)
Peak memory usage: 12.65 MiB.
```

<Note>
  If ClickHouse was started with `jemalloc_enable_global_profiler`, you don't have to enable `jemalloc_enable_profiler`.
  Same is true for `jemalloc_collect_global_profile_samples_in_trace_log` and `jemalloc_collect_profile_samples_in_trace_log`.
</Note>

Flush the `system.trace_log`:

```sql theme={null}
SYSTEM FLUSH LOGS trace_log
```

Then query it to get cumulative memory usage over time:

```sql theme={null}
WITH per_bucket AS
(
    SELECT
        event_time_microseconds AS bucket_time,
        sum(size) AS bucket_sum
    FROM system.trace_log
    WHERE trace_type = 'JemallocSample'
      AND query_id = '8678d8fe-62c5-48b8-b0cd-26851c62dd75'
    GROUP BY bucket_time
)
SELECT
    bucket_time,
    sum(bucket_sum) OVER (
        ORDER BY bucket_time ASC
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_size,
    formatReadableSize(cumulative_size) AS cumulative_size_readable
FROM per_bucket
ORDER BY bucket_time
```

Find the time where memory usage was the highest:

```sql theme={null}
SELECT
    argMax(bucket_time, cumulative_size),
    max(cumulative_size)
FROM
(
    WITH per_bucket AS
    (
        SELECT
            event_time_microseconds AS bucket_time,
            sum(size) AS bucket_sum
        FROM system.trace_log
        WHERE trace_type = 'JemallocSample'
          AND query_id = '8678d8fe-62c5-48b8-b0cd-26851c62dd75'
        GROUP BY bucket_time
    )
    SELECT
        bucket_time,
        sum(bucket_sum) OVER (
            ORDER BY bucket_time ASC
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS cumulative_size,
        formatReadableSize(cumulative_size) AS cumulative_size_readable
    FROM per_bucket
    ORDER BY bucket_time
)
```

Using that result, see which allocation stacks were most active at the peak:

```sql theme={null}
SELECT
    concat(
        '\n',
        arrayStringConcat(
            arrayMap(
                (x, y) -> concat(x, ': ', y),
                arrayMap(x -> addressToLine(x), allocation_trace),
                arrayMap(x -> demangle(addressToSymbol(x)), allocation_trace)
            ),
            '\n'
        )
    ) AS symbolized_trace,
    sum(s) AS per_trace_sum
FROM
(
    SELECT
        ptr,
        sum(size) AS s,
        argMax(trace, event_time_microseconds) AS allocation_trace
    FROM system.trace_log
    WHERE trace_type = 'JemallocSample'
      AND query_id = '8678d8fe-62c5-48b8-b0cd-26851c62dd75'
      AND event_time_microseconds <= '2025-09-04 11:56:21.737139'
    GROUP BY ptr
    HAVING s > 0
)
GROUP BY ALL
ORDER BY per_trace_sum ASC
```

<h2 id="jemalloc-web-ui">
  Jemalloc web UI
</h2>

<Note>
  This section is applicable for versions 26.2+.
</Note>

ClickHouse provides a built-in web UI for viewing jemalloc memory statistics at the `/jemalloc` HTTP endpoint.
It displays live memory metrics with charts, including allocated, active, resident, and mapped memory, as well as per-arena and per-bin statistics.
You can also fetch global and per-query heap profiles directly from the UI.

<Tabs>
  <Tab title="ClickHouse">
    ```text theme={null}
    http://localhost:8123/jemalloc
    ```

    The server UI includes all tabs: Summary, Allocations, Arenas, Operations, Global Profiler, Query Profiler, and Raw Output.
  </Tab>

  <Tab title="Keeper">
    ```text theme={null}
    http://localhost:9182/jemalloc
    ```

    The Keeper UI is available on the HTTP control port. This port is **disabled by default** and must be explicitly enabled by setting `keeper_server.http_control.port` in the Keeper configuration:

    ```xml theme={null}
    <clickhouse>
        <keeper_server>
            <http_control>
                <port>9182</port>
            </http_control>
        </keeper_server>
    </clickhouse>
    ```

    Once enabled, the UI provides the same visualizations as the server — Summary, Allocations, Arenas, Operations, Global Profiler, and Raw Output — except for the Query Profiler tab which requires SQL and `system.trace_log`.

    <Warning>
      **Security**

      The Keeper HTTP control port does not have application-level authentication. Unlike the ClickHouse Server jemalloc UI — where all data queries go through the SQL HTTP handler and require user/password credentials — the Keeper REST API endpoints are unauthenticated. This is consistent with other Keeper HTTP control endpoints (commands, storage, dashboard).

      Restrict access to this port using network-level controls: bind Keeper to localhost, use firewall rules, or place it behind a reverse proxy with authentication. When no `listen_host` is configured, Keeper defaults to listening on localhost only.
    </Warning>

    Keeper also exposes REST API endpoints for programmatic access:

    * `GET /jemalloc/stats` — raw `malloc_stats_print` output
    * `GET /jemalloc/status` — profiling state as JSON (`prof_enabled`, `prof_active`, `thread_active_init`, `lg_sample`)
    * `GET /jemalloc/profile?format={collapsed|raw}` — flushes a heap profile with server-side symbolization, returns collapsed stacks suitable for flame graph rendering (default) or the raw jemalloc dump
  </Tab>
</Tabs>

<h2 id="fetching-heap-profiles-from-sql">
  Fetching heap profiles from SQL
</h2>

<Note>
  This section is applicable for versions 26.2+.
</Note>

The `system.jemalloc_profile_text` system table lets you fetch and view the current jemalloc heap profile directly from SQL, without needing external tools or flushing to disk first.

The table has a single column:

| Column | Type   | Description                                     |
| ------ | ------ | ----------------------------------------------- |
| `line` | String | Line from the symbolized jemalloc heap profile. |

You can query the table directly — there is no need to flush a heap profile beforehand:

```sql theme={null}
SELECT * FROM system.jemalloc_profile_text
```

<h3 id="output-format">
  Output format
</h3>

The output format is controlled by the `jemalloc_profile_text_output_format` setting, which supports three values:

* `raw` — raw heap profile as produced by jemalloc.
* `symbolized` — jeprof-compatible format with embedded function symbols. Since symbols are already embedded, `jeprof` can analyze the output without requiring the ClickHouse binary.
* `collapsed` (default) — FlameGraph-compatible collapsed stacks, one stack per line with the byte count.

For example, to get the raw profile:

```sql theme={null}
SELECT * FROM system.jemalloc_profile_text
SETTINGS jemalloc_profile_text_output_format = 'raw'
```

To get symbolized output:

```sql theme={null}
SELECT * FROM system.jemalloc_profile_text
SETTINGS jemalloc_profile_text_output_format = 'symbolized'
```

<h3 id="fetching-heap-profiles-settings">
  Additional settings
</h3>

* `jemalloc_profile_text_symbolize_with_inline` (Bool, default: `true`) — Whether to include inline frames when symbolizing. Disabling this speeds up symbolization significantly but loses precision as inlined function calls will not appear in the stacks. Only affects `symbolized` and `collapsed` formats.
* `jemalloc_profile_text_collapsed_use_count` (Bool, default: `false`) — When using the `collapsed` format, aggregate by allocation count instead of bytes.

<h3 id="example-flamegraph-from-sql">
  Example: generating a flame graph from SQL
</h3>

Since the default output format is `collapsed`, you can pipe the output directly to FlameGraph:

```sh theme={null}
clickhouse-client -q "SELECT * FROM system.jemalloc_profile_text" | flamegraph.pl --color=mem --title="Allocation Flame Graph" --width 2400 > result.svg
```

To generate a flame graph by allocation count instead of bytes:

```sh theme={null}
clickhouse-client -q "SELECT * FROM system.jemalloc_profile_text SETTINGS jemalloc_profile_text_collapsed_use_count = 1" | flamegraph.pl --color=mem --title="Allocation Count Flame Graph" --width 2400 > result.svg
```

<h2 id="flushing-heap-profiles">
  Flushing heap profiles to disk
</h2>

If you need to save heap profiles as files for offline analysis with `jeprof`, you can flush them to disk.

By default, the heap profile file will be generated in `/tmp/jemalloc_clickhouse._pid_._seqnum_.heap` where `_pid_` is the PID of ClickHouse and `_seqnum_` is the global sequence number for the current heap profile.
For Keeper, the default file is `/tmp/jemalloc_keeper._pid_._seqnum_.heap`, and follows the same rules.

To flush the current profile:

<Tabs>
  <Tab title="ClickHouse">
    ```sql theme={null}
    SYSTEM JEMALLOC FLUSH PROFILE
    ```

    It will return the location of the flushed profile.
  </Tab>

  <Tab title="Keeper">
    ```sh theme={null}
    echo jmfp | nc localhost 9181
    ```
  </Tab>
</Tabs>

A different location can be defined by appending the `MALLOC_CONF` environment variable with the `prof_prefix` option.
For example, if you want to generate profiles in the `/data` folder where the filename prefix will be `my_current_profile`, you can run ClickHouse/Keeper with the following environment variable:

```sh theme={null}
MALLOC_CONF=prof_prefix:/data/my_current_profile
```

The generated file will be appended to the prefix PID and sequence number.

<h2 id="analyzing-heap-profile-files-with-jeprof">
  Analyzing heap profile files with `jeprof`
</h2>

After flushing heap profiles to disk, they can be analyzed using `jemalloc`'s tool called [jeprof](https://github.com/jemalloc/jemalloc/blob/dev/bin/jeprof.in). It can be installed in multiple ways:

* Using the system's package manager
* Cloning the [jemalloc repo](https://github.com/jemalloc/jemalloc) and running `autogen.sh` from the root folder. This will provide you with the `jeprof` script inside the `bin` folder

There are many different output formats available. Run `jeprof --help` for the full list of options.

<h3 id="symbolized-heap-profiles">
  Symbolized heap profiles
</h3>

Starting from version 26.1+, ClickHouse automatically generates symbolized heap profiles when you flush using `SYSTEM JEMALLOC FLUSH PROFILE`.
The symbolized profile (with `.symbolized` extension) contains embedded function symbols and can be analyzed by `jeprof` without requiring the ClickHouse binary.

For example, when you run:

```sql theme={null}
SYSTEM JEMALLOC FLUSH PROFILE
```

ClickHouse will return the path to the symbolized profile (e.g., `/tmp/jemalloc_clickhouse.12345.0.heap.symbolized`).

You can then analyze it directly with `jeprof`:

```sh theme={null}
jeprof /tmp/jemalloc_clickhouse.12345.0.heap.symbolized --output_format [ > output_file]
```

<Note>
  **No binary required**: When using symbolized profiles (`.symbolized` files), you don't need to provide the ClickHouse binary path to `jeprof`. This makes it much easier to analyze profiles on different machines or after the binary has been updated.
</Note>

If you have an older non-symbolized heap profile and still have access to the ClickHouse binary, you can use the traditional approach:

```sh theme={null}
jeprof path/to/clickhouse path/to/heap/profile --output_format [ > output_file]
```

<Note>
  For non-symbolized profiles, `jeprof` uses `addr2line` to generate stacktraces which can be really slow.
  If that's the case, it is recommended to install an [alternative implementation](https://github.com/gimli-rs/addr2line) of the tool.

  ```bash theme={null}
  git clone https://github.com/gimli-rs/addr2line.git --depth=1 --branch=0.23.0
  cd addr2line
  cargo build --features bin --release
  cp ./target/release/addr2line path/to/current/addr2line
  ```

  Alternatively, `llvm-addr2line` works equally well (But note, that `llvm-objdump` is not compatible with `jeprof`)

  And later use it like this `jeprof --tools addr2line:/usr/bin/llvm-addr2line,nm:/usr/bin/llvm-nm,objdump:/usr/bin/objdump,c++filt:/usr/bin/llvm-cxxfilt`
</Note>

When comparing two profiles, you can use the `--base` argument:

```sh theme={null}
jeprof --base /path/to/first.heap.symbolized /path/to/second.heap.symbolized --output_format [ > output_file]
```

<h3 id="examples">
  Examples
</h3>

Using symbolized profiles (recommended):

* Generate a text file with each procedure written per line:

```sh theme={null}
jeprof /tmp/jemalloc_clickhouse.12345.0.heap.symbolized --text > result.txt
```

* Generate a PDF file with a call-graph:

```sh theme={null}
jeprof /tmp/jemalloc_clickhouse.12345.0.heap.symbolized --pdf > result.pdf
```

Using non-symbolized profiles (requires binary):

* Generate a text file with each procedure written per line:

```sh theme={null}
jeprof /path/to/clickhouse /tmp/jemalloc_clickhouse.12345.0.heap --text > result.txt
```

* Generate a PDF file with a call-graph:

```sh theme={null}
jeprof /path/to/clickhouse /tmp/jemalloc_clickhouse.12345.0.heap --pdf > result.pdf
```

<h3 id="generating-flame-graph">
  Generating a flame graph
</h3>

`jeprof` allows you to generate collapsed stacks for building flame graphs.

You need to use the `--collapsed` argument:

```sh theme={null}
jeprof /tmp/jemalloc_clickhouse.12345.0.heap.symbolized --collapsed > result.collapsed
```

Or with a non-symbolized profile:

```sh theme={null}
jeprof /path/to/clickhouse /tmp/jemalloc_clickhouse.12345.0.heap --collapsed > result.collapsed
```

After that, you can use many different tools to visualize collapsed stacks.

The most popular is [FlameGraph](https://github.com/brendangregg/FlameGraph) which contains a script called `flamegraph.pl`:

```sh theme={null}
cat result.collapsed | /path/to/FlameGraph/flamegraph.pl --color=mem --title="Allocation Flame Graph" --width 2400 > result.svg
```

Another interesting tool is [speedscope](https://www.speedscope.app/) that allows you to analyze collected stacks in a more interactive way.

<h2 id="additional-options-for-profiler">
  Additional options for the profiler
</h2>

`jemalloc` has many different options available, which are related to the profiler. They can be controlled by modifying the `MALLOC_CONF` environment variable.
For example, the interval between allocation samples can be controlled with `lg_prof_sample`.
If you want to dump the heap profile every N bytes you can enable it using `lg_prof_interval`.

It is recommended to check `jemalloc`s [reference page](https://jemalloc.net/jemalloc.3.html) for a complete list of options.

<h2 id="other-resources">
  Other resources
</h2>

ClickHouse/Keeper expose `jemalloc` related metrics in many different ways.

<Warning>
  **Warning**

  It's important to be aware that none of these metrics are synchronized with each other and values may drift.
</Warning>

<h3 id="system-table-asynchronous_metrics">
  System table `asynchronous_metrics`
</h3>

```sql theme={null}
SELECT *
FROM system.asynchronous_metrics
WHERE metric LIKE '%jemalloc%'
FORMAT Vertical
```

[Reference](/reference/system-tables/asynchronous_metrics)

<h3 id="system-table-jemalloc_bins">
  System table `jemalloc_bins`
</h3>

Contains information about memory allocations done via the jemalloc allocator in different size classes (bins) aggregated from all arenas.

[Reference](/reference/system-tables/jemalloc_bins)

<h3 id="system-table-jemalloc_stats">
  System table `jemalloc_stats` (26.2+)
</h3>

Returns the full output of `malloc_stats_print()` as a single string. Equivalent to the `SYSTEM JEMALLOC STATS` command.

```sql theme={null}
SELECT * FROM system.jemalloc_stats
```

<h3 id="prometheus">
  Prometheus
</h3>

All `jemalloc` related metrics from `asynchronous_metrics` are also exposed using the Prometheus endpoint in both ClickHouse and Keeper.

[Reference](/reference/settings/server-settings/settings#prometheus)

<h3 id="jmst-4lw-command-in-keeper">
  `jmst` 4LW command in Keeper
</h3>

Keeper supports the `jmst` 4LW command which returns [basic allocator statistics](https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Basic-Allocator-Statistics):

```sh theme={null}
echo jmst | nc localhost 9181
```
