> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-fix-nav-issues.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

> Documentation for functions for splitting strings

# Functions for splitting strings

export const DeprecatedBadge = () => {
  return <div className="deprecatedBadge">
            <div className="deprecatedIcon">
            <svg width="14" height="10" viewBox="0 0 14 10" fill="none" xmlns="http://www.w3.org/2000/svg">
                <path d="M13 0H1C0.734784 0 0.48043 0.105357 0.292893 0.292893C0.105357 0.48043 0 0.734784 0 1V2.5C0 2.76522 0.105357 3.01957 0.292893 3.20711C0.48043 3.39464 0.734784 3.5 1 3.5V9C1 9.26522 1.10536 9.51957 1.29289 9.70711C1.48043 9.89464 1.73478 10 2 10H12C12.2652 10 12.5196 9.89464 12.7071 9.70711C12.8946 9.51957 13 9.26522 13 9V3.5C13.2652 3.5 13.5196 3.39464 13.7071 3.20711C13.8946 3.01957 14 2.76522 14 2.5V1C14 0.734784 13.8946 0.48043 13.7071 0.292893C13.5196 0.105357 13.2652 0 13 0ZM12 9H2V3.5H12V9ZM13 2.5H1V1H13V2.5ZM5 5.5C5 5.36739 5.05268 5.24021 5.14645 5.14645C5.24021 5.05268 5.36739 5 5.5 5H8.5C8.63261 5 8.75979 5.05268 8.85355 5.14645C8.94732 5.24021 9 5.36739 9 5.5C9 5.63261 8.94732 5.75979 8.85355 5.85355C8.75979 5.94732 8.63261 6 8.5 6H5.5C5.36739 6 5.24021 5.94732 5.14645 5.85355C5.05268 5.75979 5 5.63261 5 5.5Z" fill="currentColor" />
            </svg>
        </div>
            Deprecated feature
        </div>;
};

<Note>
  The documentation below is generated from the `system.functions` system table.
</Note>

{/*AUTOGENERATED_START*/}

<h2 id="alphaTokens">
  alphaTokens
</h2>

Introduced in: v1.1.0

Selects substrings of consecutive bytes from the ranges `a-z` and `A-Z` and returns an array of the selected substrings.

**Syntax**

```sql theme={null}
alphaTokens(s[, max_substrings])
```

**Aliases**: `splitByAlpha`

**Arguments**

* `s` — The string to split. [`String`](/reference/data-types/string)
* `max_substrings` — Optional. When `max_substrings > 0`, the number of returned substrings will be no more than `max_substrings`, otherwise the function will return as many substrings as possible. [`Int64`](/reference/data-types/int-uint)

**Returned value**

Returns an array of selected substrings of `s`. [`Array(String)`](/reference/data-types/array)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT alphaTokens('abca1abc');
```

```response title=Response theme={null}
┌─alphaTokens('abca1abc')─┐
│ ['abca','abc']          │
└─────────────────────────┘
```

<h2 id="arrayStringConcat">
  arrayStringConcat
</h2>

Introduced in: v1.1.0

Concatenates string representations of values listed in the array with the provided separator, which is an optional parameter set to an empty string by default.

**Syntax**

```sql theme={null}
arrayStringConcat(arr[, separator])
```

**Aliases**: `array_to_string`

**Arguments**

* `arr` — The array to concatenate. [`Array(T)`](/reference/data-types/array)
* `separator` — Optional. Separator string. By default an empty string. [`const String`](/reference/data-types/string)

**Returned value**

Returns the concatenated string. [`String`](/reference/data-types/string)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT arrayStringConcat(['12/05/2021', '12:50:00'], ' ') AS DateString;
```

```response title=Response theme={null}
┌─DateString──────────┐
│ 12/05/2021 12:50:00 │
└─────────────────────┘
```

<h2 id="extractAllGroupsVertical">
  extractAllGroupsVertical
</h2>

Introduced in: v20.5.0

Matches all groups of a string using a regular expression and returns an array of arrays, where each array includes matching fragments from every group, grouped in order of appearance in the input string.

**Syntax**

```sql theme={null}
extractAllGroupsVertical(s, regexp)
```

**Aliases**: `extractAllGroups`

**Arguments**

* `s` — Input string to extract from. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring)
* `regexp` — Regular expression to match by. [`const String`](/reference/data-types/string) or [`const FixedString`](/reference/data-types/fixedstring)

**Returned value**

Returns an array of arrays, where each inner array contains the captured groups from one match. Each match produces an array with elements corresponding to the capturing groups in the regular expression (group 1, group 2, etc.). If no matches are found, returns an empty array. [`Array(Array(String))`](/reference/data-types/array)

**Examples**

**Usage example**

```sql title=Query theme={null}
WITH '< Server: nginx
< Date: Tue, 22 Jan 2019 00:26:14 GMT
< Content-Type: text/html; charset=UTF-8
< Connection: keep-alive
' AS s
SELECT extractAllGroupsVertical(s, '< ([\\w\\-]+): ([^\\r\\n]+)');
```

```response title=Response theme={null}
[['Server','nginx'],['Date','Tue, 22 Jan 2019 00:26:14 GMT'],['Content-Type','text/html; charset=UTF-8'],['Connection','keep-alive']]
```

<h2 id="ngrams">
  ngrams
</h2>

Introduced in: v21.11.0

Splits a UTF-8 string into n-grams of length `N`.

**Syntax**

```sql theme={null}
ngrams(s, N)
```

**Arguments**

* `s` — Input string. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring)
* `N` — The n-gram length. [`const UInt8/16/32/64`](/reference/data-types/int-uint)

**Returned value**

Returns an array with n-grams. [`Array(String)`](/reference/data-types/array)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT ngrams('ClickHouse', 3);
```

```response title=Response theme={null}
['Cli','lic','ick','ckH','kHo','Hou','ous','use']
```

<h2 id="reverseBySeparator">
  reverseBySeparator
</h2>

Introduced in: v26.2.0

Reverses the order of substrings in a string separated by a specified separator.
This function splits the string by the separator, reverses the order of the resulting parts,
and joins them back using the same separator. It is useful for parsing domain names,
file paths, or other hierarchical data where you need to reverse the order of components.

Examples:

* reverseBySeparator('[www.google.com](http://www.google.com)') returns 'com.google.www'
* reverseBySeparator('a/b/c', '/') returns 'c/b/a'
* reverseBySeparator('x::y::z', '::') returns 'z::y::x'

**Syntax**

```sql theme={null}
reverseBySeparator(string[, separator])
```

**Arguments**

* `string` — The input string to reverse the order of its parts. [`String`](/reference/data-types/string)
* `separator` — The separator string used to identify parts. If not provided, uses '.' (dot). Default: '.' [`String`](/reference/data-types/string)

**Returned value**

Returns a string with substrings ordered from right to left of the original string, joined by the same separator. [`String`](/reference/data-types/string)

**Examples**

**Basic domain reversal**

```sql title=Query theme={null}
SELECT reverseBySeparator('www.google.com')
```

```response title=Response theme={null}
'com.google.www'
```

**Path reversal**

```sql title=Query theme={null}
SELECT reverseBySeparator('a/b/c', '/')
```

```response title=Response theme={null}
'c/b/a'
```

**Custom separator**

```sql title=Query theme={null}
SELECT reverseBySeparator('x::y::z', '::')
```

```response title=Response theme={null}
'z::y::x'
```

**Edge case with dots**

```sql title=Query theme={null}
SELECT reverseBySeparator('.a.b.', '.')
```

```response title=Response theme={null}
'.b.a.'
```

**Single element**

```sql title=Query theme={null}
SELECT reverseBySeparator('single')
```

```response title=Response theme={null}
'single'
```

**Empty separator**

```sql title=Query theme={null}
SELECT reverseBySeparator('abcde', '')
```

```response title=Response theme={null}
'edcba'
```

<h2 id="splitByChar">
  splitByChar
</h2>

Introduced in: v1.1.0

Splits a string separated by a specified constant string `separator` of exactly one character into an array of substrings.
Empty substrings may be selected if the separator occurs at the beginning or end of the string, or if there are multiple consecutive separators.

<Note>
  Setting [`splitby_max_substrings_includes_remaining_string`](/reference/settings/session-settings#splitby_max_substrings_includes_remaining_string) (default: `0`) controls if the remaining string is included in the last element of the result array when argument `max_substrings > 0`.
</Note>

Empty substrings may be selected when:

* A separator occurs at the beginning or end of the string
* There are multiple consecutive separators
* The original string `s` is empty

**Syntax**

```sql theme={null}
splitByChar(separator, s[, max_substrings])
```

**Arguments**

* `separator` — The separator must be a single-byte character. [`String`](/reference/data-types/string)
* `s` — The string to split. [`String`](/reference/data-types/string)
* `max_substrings` — Optional. If `max_substrings > 0`, the returned array will contain at most `max_substrings` substrings, otherwise the function will return as many substrings as possible. The default value is `0`.  [`Int64`](/reference/data-types/int-uint)

**Returned value**

Returns an array of selected substrings. [`Array(String)`](/reference/data-types/array)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT splitByChar(',', '1,2,3,abcde');
```

```response title=Response theme={null}
┌─splitByChar(⋯2,3,abcde')─┐
│ ['1','2','3','abcde']    │
└──────────────────────────┘
```

<h2 id="splitByNonAlpha">
  splitByNonAlpha
</h2>

Introduced in: v21.9.0

Splits a string separated by whitespace and punctuation characters into an array of substrings.

<Note>
  Setting [`splitby_max_substrings_includes_remaining_string`](/reference/settings/session-settings#splitby_max_substrings_includes_remaining_string) (default: `0`) controls if the remaining string is included in the last element of the result array when argument `max_substrings > 0`.
</Note>

**Syntax**

```sql theme={null}
splitByNonAlpha(s[, max_substrings])
```

**Arguments**

* `s` — The string to split. [`String`](/reference/data-types/string)
* `max_substrings` — Optional. When `max_substrings > 0`, the returned substrings will be no more than `max_substrings`, otherwise the function will return as many substrings as possible. Default value: `0`. [`Int64`](/reference/data-types/int-uint)

**Returned value**

Returns an array of selected substrings of `s`. [`Array(String)`](/reference/data-types/array)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT splitByNonAlpha('user@domain.com');
```

```response title=Response theme={null}
['user','domain','com']
```

<h2 id="splitByRegexp">
  splitByRegexp
</h2>

Introduced in: v21.6.0

Splits a string which is separated by the provided regular expression into an array of substrings.
If the provided regular expression is empty, it will split the string into an array of single characters.
If no match is found for the regular expression, the string won't be split.

Empty substrings may be selected when:

* a non-empty regular expression match occurs at the beginning or end of the string
* there are multiple consecutive non-empty regular expression matches
* the original string string is empty while the regular expression is not empty.

<Note>
  Setting [`splitby_max_substrings_includes_remaining_string`](/reference/settings/session-settings#splitby_max_substrings_includes_remaining_string) (default: `0`) controls if the remaining string is included in the last element of the result array when argument `max_substrings > 0`.
</Note>

**Syntax**

```sql theme={null}
splitByRegexp(regexp, s[, max_substrings])
```

**Arguments**

* `regexp` — Regular expression. Constant. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring)
* `s` — The string to split. [`String`](/reference/data-types/string)
* `max_substrings` — Optional. When `max_substrings > 0`, the returned substrings will be no more than `max_substrings`, otherwise the function will return as many substrings as possible. Default value: `0`. [`Int64`](/reference/data-types/int-uint)

**Returned value**

Returns an array of the selected substrings of `s`. [`Array(String)`](/reference/data-types/array)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT splitByRegexp('\\d+', 'a12bc23de345f');
```

```response title=Response theme={null}
┌─splitByRegex⋯c23de345f')─┐
│ ['a12bc23de345f']        │
└──────────────────────────┘
```

**Empty regexp**

```sql title=Query theme={null}
SELECT splitByRegexp('', 'abcde');
```

```response title=Response theme={null}
┌─splitByRegexp('', 'abcde')─┐
│ ['a','b','c','d','e']      │
└────────────────────────────┘
```

<h2 id="splitByString">
  splitByString
</h2>

Introduced in: v1.1.0

Splits a string with a constant `separator` consisting of multiple characters into an array of substrings.
If the string `separator` is empty, it will split the string `s` into an array of single characters.

Empty substrings may be selected when:

* A non-empty separator occurs at the beginning or end of the string
* There are multiple consecutive non-empty separators
* The original string `s` is empty while the separator is not empty

<Note>
  Setting [`splitby_max_substrings_includes_remaining_string`](/reference/settings/session-settings#splitby_max_substrings_includes_remaining_string) (default: `0`) controls if the remaining string is included in the last element of the result array when argument `max_substrings > 0`.
</Note>

**Syntax**

```sql theme={null}
splitByString(separator, s[, max_substrings])
```

**Arguments**

* `separator` — The separator. [`String`](/reference/data-types/string)
* `s` — The string to split. [`String`](/reference/data-types/string)
* `max_substrings` — Optional. When `max_substrings > 0`, the returned substrings will be no more than `max_substrings`, otherwise the function will return as many substrings as possible. Default value: `0`. [`Int64`](/reference/data-types/int-uint)

**Returned value**

Returns an array of selected substrings of `s` [`Array(String)`](/reference/data-types/array)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT splitByString(', ', '1, 2 3, 4,5, abcde');
```

```response title=Response theme={null}
┌─splitByStrin⋯4,5, abcde')─┐
│ ['1','2 3','4,5','abcde'] │
└───────────────────────────┘
```

**Empty separator**

```sql title=Query theme={null}
SELECT splitByString('', 'abcde');
```

```response title=Response theme={null}
┌─splitByString('', 'abcde')─┐
│ ['a','b','c','d','e']      │
└────────────────────────────┘
```

<h2 id="splitByWhitespace">
  splitByWhitespace
</h2>

Introduced in: v21.9.0

Splits a string which is separated by whitespace characters into an array of substrings.

<Note>
  Setting [`splitby_max_substrings_includes_remaining_string`](/reference/settings/session-settings#splitby_max_substrings_includes_remaining_string) (default: `0`) controls if the remaining string is included in the last element of the result array when argument `max_substrings > 0`.
</Note>

**Syntax**

```sql theme={null}
splitByWhitespace(s[, max_substrings])
```

**Arguments**

* `s` — The string to split. [`String`](/reference/data-types/string)
* `max_substrings` — Optional. When `max_substrings > 0`, the returned substrings will be no more than `max_substrings`, otherwise the function will return as many substrings as possible. Default value: `0`. [`Int64`](/reference/data-types/int-uint)

**Returned value**

Returns an array of the selected substrings of `s`. [`Array(String)`](/reference/data-types/array)

**Examples**

**Usage example**

```sql title=Query theme={null}
SELECT splitByWhitespace('  1!  a,  b.  ');
```

```response title=Response theme={null}
['1!','a,','b.']
```

<h2 id="tokens">
  tokens
</h2>

Introduced in: v21.11.0

Splits a string into tokens using the given tokenizer.

Available tokenizers:

* `splitByNonAlpha` splits strings along non-alphanumeric ASCII characters (also see function [splitByNonAlpha](/reference/functions/regular-functions/splitting-merging-functions#splitByNonAlpha)).
* `splitByString(S)` splits strings along certain user-defined separator strings `S` (also see function [splitByString](/reference/functions/regular-functions/splitting-merging-functions#splitByString)). The separators can be specified using an optional parameter, for example, `tokens(value, 'splitByString', [', ', '; ', '\n', '\\'])`. Note that each string can consist of multiple characters (`', '` in the example). The default separator list, if not specified explicitly, is a single whitespace `[' ']`.
* `asciiCJK` splits strings into tokens using Unicode word boundary rules (similar to UAX #29). ASCII alphanumeric characters and underscores form tokens with connectors (`:` for letters, `.` and `'` for same-type characters). Non-ASCII Unicode characters become single-character tokens.
* `ngrams(N)` splits strings into equally large `N`-grams (also see function [ngrams](/reference/functions/regular-functions/splitting-merging-functions#ngrams)). The ngram length can be specified using an optional integer parameter between 1 and 8, for example, `tokens(value, 'ngrams', 3)`. The default ngram size, if not specified explicitly, is 3.
* `sparseGrams(min_length, max_length, min_cutoff_length)` splits strings into variable-length n-grams of at least `min_length` and at most `max_length` (inclusive) characters (also see function [sparseGrams](/reference/functions/regular-functions/string-functions#sparseGrams)). Unless specified explicitly, `min_length` and `max_length` default to 3 and 100. If parameter `min_cutoff_length` is provided, only n-grams with length greater or equal than `min_cutoff_length` are returned. Compared to `ngrams(N)`, the `sparseGrams` tokenizer produces variable-length N-grams, allowing for a more flexible representation of the original text. For example, `tokens(value, 'sparseGrams', 3, 5, 4)` internally generates 3-, 4-, 5-grams from the input string but only the 4- and 5-grams are returned.
* `array` performs no tokenization, i.e. every row value is a token (also see function [array](/reference/functions/regular-functions/array-functions#array)).

In case of the `splitByString` tokenizer, if the tokens do not form a [prefix code](https://en.wikipedia.org/wiki/Prefix_code), you likely want that the matching prefers longer separators first.
To do so, pass the separators in order of descending length.
For example, with separators = `['%21', '%']` string `%21abc` would be tokenized as `['abc']`, whereas separators = `['%', '%21']` would tokenize to `['21ac']` (which is likely not what you wanted).

**Syntax**

```sql theme={null}
tokens(value) -- 'splitByNonAlpha' tokenizer
tokens(value, 'splitByNonAlpha')
tokens(value, 'splitByString'[, separators])
tokens(value, 'asciiCJK')
tokens(value, 'ngrams'[, n])
tokens(value, 'sparseGrams'[, min_length, max_length[, min_cutoff_length]])
tokens(value, 'array')
```

**Arguments**

* `value` — The input string. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring)
* `tokenizer` — The tokenizer to use. Valid arguments are `splitByNonAlpha`, `splitByString`, `asciiCJK`, `ngrams`, `sparseGrams`, and `array`. Optional, if not set explicitly, defaults to `splitByNonAlpha`. [`const String`](/reference/data-types/string)
* `n` — Only relevant if argument `tokenizer` is `ngrams`: An optional parameter which defines the length of the ngrams. If not set explicitly, defaults to `3`. [`const UInt8`](/reference/data-types/int-uint)
* `separators` — Only relevant if argument `tokenizer` is `split`: An optional parameter which defines the separator strings. If not set explicitly, defaults to `[' ']`. [`const Array(String)`](/reference/data-types/array)
* `min_length` — Only relevant if argument `tokenizer` is `sparseGrams`: An optional parameter which defines the minimum gram length, defaults to 3. [`const UInt8`](/reference/data-types/int-uint)
* `max_length` — Only relevant if argument `tokenizer` is `sparseGrams`: An optional parameter which defines the maximum gram length, defaults to 100. [`const UInt8`](/reference/data-types/int-uint)
* `min_cutoff_length` — Only relevant if argument `tokenizer` is `sparseGrams`: An optional parameter which defines the minimum cutoff length. [`const UInt8`](/reference/data-types/int-uint)

**Returned value**

Returns the resulting array of tokens from input string. [`Array`](/reference/data-types/array)

**Examples**

**Default tokenizer**

```sql title=Query theme={null}
SELECT tokens('test1,;\\\\ test2,;\\\\ test3,;\\\\   test4') AS tokens;
```

```response title=Response theme={null}
['test1','test2','test3','test4']
```

**Ngram tokenizer**

```sql title=Query theme={null}
SELECT tokens('abc def', 'ngrams', 3) AS tokens;
```

```response title=Response theme={null}
['abc','bc ','c d',' de','def']
```

<h2 id="tokensForLikePattern">
  tokensForLikePattern
</h2>

Introduced in: v26.3.0

Splits a LIKE pattern string into tokens using the specified tokenizer.

Unlike the `tokens` function, this function is aware of LIKE pattern semantics
(such as leading and trailing wildcard characters) and applies tokenizer-specific
rules to extract meaningful tokens for pattern matching.

It supports the same argument sets as the `tokens` function; additional
arguments after `tokenizer` are interpreted according to the selected
tokenizer (for example, `n` for `ngrams`, `separators` for `splitByString`,
and `min_length` / `max_length` \[/ `min_cutoff_length`] for `sparseGrams`).

This function is primarily intended for debugging and testing purposes,
and is used internally to analyze tokenization behavior for LIKE patterns.

**Syntax**

```sql theme={null}
tokensForLikePattern(value[, tokenizer[, tokenizer_specific_arguments...]])
```

**Arguments**

* `value` — The input string. [`String`](/reference/data-types/string) or [`FixedString`](/reference/data-types/fixedstring)
* `tokenizer` — The tokenizer to use. Valid arguments are `splitByNonAlpha`, `splitByString`, `asciiCJK`, `ngrams`, `sparseGrams`, and `array`. Optional, if not set explicitly, defaults to `splitByNonAlpha`. [`const String`](/reference/data-types/string)
* `n` — Only relevant if argument `tokenizer` is `ngrams`: An optional parameter which defines the length of the ngrams. If not set explicitly, defaults to `3`. [`const UInt8`](/reference/data-types/int-uint)
* `separators` — Only relevant if argument `tokenizer` is `split`: An optional parameter which defines the separator strings. If not set explicitly, defaults to `[' ']`. [`const Array(String)`](/reference/data-types/array)
* `min_length` — Only relevant if argument `tokenizer` is `sparseGrams`: An optional parameter which defines the minimum gram length, defaults to 3. [`const UInt8`](/reference/data-types/int-uint)
* `max_length` — Only relevant if argument `tokenizer` is `sparseGrams`: An optional parameter which defines the maximum gram length, defaults to 100. [`const UInt8`](/reference/data-types/int-uint)
* `min_cutoff_length` — Only relevant if argument `tokenizer` is `sparseGrams`: An optional parameter which defines the minimum cutoff length. [`const UInt8`](/reference/data-types/int-uint)

**Returned value**

Returns the resulting array of tokens from input string. [`Array`](/reference/data-types/array)

**Examples**

**Default tokenizer**

```sql title=Query theme={null}
SELECT tokensForLikePattern('%test1,test2,test3%') AS tokens;
```

```response title=Response theme={null}
['test2']
```
