Field value extraction functions
Field value extraction functions
Introduction
A common use case of key-value extraction functions is shown in the following figure. After processing into structured data, it can be further used in SQL analysis scenarios.
e_regex function
Function definition
Get the field value and return the corresponding string.
Syntax description
e_regex(field, regex, fields_info=None, mode="fill-auto", pack_json='')Parameter description
| Parameter name | Parameter description | Parameter type | Required or not | Parameter default | Parameter range |
|---|---|---|---|---|---|
| field | The field name to be extracted | String | Yes | - | - |
| regex | Regular expression | String | Yes | - | - |
| fields_info | The target field name after matching. This parameter must be configured when the regular expression parameter does not configure the name of the named capture. | List<Table> | No | - | - |
| mode | Field overwriting mode. The default is fill-auto | String | No | overwrite | fill/fill-auto/add/add-auto/overwrite/overwrite-auto |
| pack_json | Pack all matching results of the regular expression into the field specified by pack_json. The default value is empty, indicating no packing. | String | No | - | - |
Example
- Example 1
Original log:
{"content": "1234abcd5678"} Processing rules:
e_regex("content", "\d+", [{'target1':'long'}]) Processing results:
{"content": "1234abcd5678", "target1": 1234}- Example 2
Original log:
{"content": "1234abcd"} Processing rules:
e_regex("content", "(?<target1>\d+)(.*)", [{'target2':'string'}]) Processing results:
{"content": "1234abcd5678", "target1": "1234", "target2": abcd}- Example 3
Original log:
{"content": "1234abcd5678"} Processing rules:
e_regex("content", "\d+", [{'target1':'long'}, {'target2':'long'}]) Processing results:
{"content": "1234abcd5678", "target1": 1234, "target2": 5678}- Example 4
Original log:
{"content": "1234abcd5678"} Processing rules:
e_regex("content", "\d+", [{'target1':'long'}, {'target2':'long'}], pack_json='new') Processing results:
{"content": "1234abcd5678", "new": {"target1": 1234, "target2": 5678}}e_json function
Function definition
Extract field values from JSON.
Syntax description
e_json(field, depth=100, prefix="", suffix="", fmt="simple", sep=".", mode="fill-auto")Parameter description
| Parameter name | Parameter description | Parameter type | Required or not | Parameter default | Parameter range |
|---|---|---|---|---|---|
| field | The field name to be extracted | String | Yes | - | - |
| depth | The depth of field expansion. The value range is 1-2000, 1 means only expanding the first layer, the default is 100 layers | Int | No | 100 | 1~2000 |
| prefix | The prefix added to the field name when expanding. | String | No | - | - |
| suffix | The suffix added to the field name when expanding. | String | No | - | - |
| fmt | Formatting method | String | No | simple | - |
| sep | The separator for formatting parent-child nodes. It needs to be set when fmt is full, parent, or root. Default to. | String | No | - | simple (default value): indicates using the node name as the field name; full: indicates combining the parent node and the current node as the field name; parent: indicates using the complete path as the field name; root: indicates combining the root node and the current node as the field name |
| mode | Field overwriting mode. The default is fill-auto | String | No | fill-auto | fill/fill-auto/add/add-auto/overwrite/overwrite-auto |
Example
- Example 1
Original log:
{"content": "{\"a\": \"a1\", \"b\": \"b1\"}"} Processing rules:
e_json("content") Processing results:
{"content": "{\"a\": \"a1\", \"b\": \"b1\"}", "a": "a1", "b", "b1"} - Example 2
Original log:
{"content": "{\"a\": \"a1\", \"b\": \"b1\"}"} Processing rules:
e_json("content", prefix="_", suffix="__") Processing results:
{"content": "{\"a\": \"a1\", \"b\": \"b1\"}", "_a__": "a1", "_b__", "b1"} e_sep function
Function definition
Extract field value content based on specified characters (multiple characters).
Syntax description
e_kv(src_field, fields_info, sep=" ", quote="", restrict=false, mode="fill-auto")Parameter description
| Parameter name | Parameter description | Parameter type | Required or not | Parameter default | Parameter range |
|---|---|---|---|---|---|
| src_field | The field name to be extracted | String | Yes | - | - |
| fields_info | The target field name after matching. | List<Table> | Yes | - | - |
| sep | Separator, not limited to a single character. | String | No | Space | - |
| quote | Quote character, used to wrap the value. | String | No | - | - |
| restrict | Default value: false When the number of extracted values is inconsistent with the number of target fields input by the user: true: ignore, no extraction processing is performed; false: try to match the first few fields | String | No | false | true/false |
| mode | Field overwriting mode. The default is fill-auto | String | No | fill-auto | fill/fill-auto/add/add-auto/overwrite/overwrite-auto |
Example
- Example 1
Original log:
{"content": "a1 b1"} Processing rules:
e_sep('content', [{'a':'string'}, {'b':'string'}])Processing results:
{"content": "a1 b1", "a": "a1", "b", "b1"} - Example 2
Original log:
{"content": "a1 b1"}Processing rules:
e_sep('k1', [{'a':'string'}])Processing results:
{"content": "a1 b1", "a": "a1"} - Example 3
Original log:
{"content": "a1 b1"}Processing rules:
e_sep('k1', [{'a':'string'}, {'b':'string'}, {'c':'string'}])Processing results:
{"content": "a1 b1", "a": "a1", "b", "b1"} e_csv function
Function definition
Extract field value content based on specified characters (multiple characters).
Syntax description
e_csv(src_field, fields_info, sep=",", quote="", restrict=false, mode="fill-auto")Parameter description
| Parameter name | Parameter description | Parameter type | Required or not | Parameter default | Parameter range |
|---|---|---|---|---|---|
| src_field | The field name to be extracted | String | Yes | - | - |
| fields_info | The target field name after matching. | List<Table> | Yes | - | - |
| sep | Separator, not limited to a single character. | String | No | , | - |
| quote | Quote character, used to wrap the value. | String | No | - | - |
| restrict | Default value: false When the number of extracted values is inconsistent with the number of target fields input by the user: true: ignore, no extraction processing is performed; false: try to match the first few fields | String | No | false | true/false |
| mode | Field overwriting mode. The default is fill-auto | String | No | fill-auto | fill/fill-auto/add/add-auto/overwrite/overwrite-auto |
Example
- Example 1
Original log:
{"content": "a1,b1"} Processing rules:
e_csv('content', [{'a':'string'}, {'b':'string'}])Processing results:
{"content": "a1,b1", "a": "a1", "b", "b1"} - Example 2
Original log:
{"content": "a1,b1"}Processing rules:
e_csv('k1', [{'a':'string'}])Processing results:
{"content": "a1,b1", "a": "a1"} - Example 3
Original log:
{"content": "a1,b1"}Processing rules:
e_csv('k1', [{'a':'string'}, {'b':'string'}, {'c':'string'}])Processing results:
{"content": "a1,b1", "a": "a1", "b", "b1"} e_psv function
Function definition
Extract field value content based on specified characters (multiple characters).
Syntax description
e_psv(src_field, fields_info, sep="|", quote="", restrict=false, mode="fill-auto")Parameter description
| Parameter name | Parameter description | Parameter type | Required or not | Parameter default | Parameter range |
|---|---|---|---|---|---|
| src_field | The field name to be extracted | String | Yes | - | - |
| fields_info | The target field name after matching. | List<Table> | Yes | - | - |
| sep | Separator, not limited to a single character. | String | No | | | - |
| quote | Quote character, used to wrap the value. | String | No | - | - |
| restrict | Default value: false When the number of extracted values is inconsistent with the number of target fields input by the user: true: ignore, no extraction processing is performed; false: try to match the first few fields | String | No | false | true/false |
| mode | Field overwriting mode. The default is fill-auto | String | No | fill-auto | fill/fill-auto/add/add-auto/overwrite/overwrite-auto |
Example
- Example 1
Original log:
{"content": "a1|b1"} Processing rules:
e_psv('content', [{'a':'string'}, {'b':'string'}])Processing results:
{"content": "a1|b1", "a": "a1", "b", "b1"} - Example 2
Original log:
{"content": "a1|b1"}Processing rules:
e_psv('k1', [{'a':'string'}])Processing results:
{"content": "a1|b1", "a": "a1"} - Example 3
Original log:
{"content": "a1|b1"}Processing rules:
e_psv('k1', [{'a':'string'}, {'b':'string'}, {'c':'string'}])Processing results:
{"content": "a1|b1", "a": "a1", "b", "b1"} e_tsv function
Function definition
Extract field value content based on specified characters (multiple characters).
Syntax description
e_tsv(src_field, fields_info, sep="\t", quote="", restrict=false, mode="fill-auto")Parameter description
| Parameter name | Parameter description | Parameter type | Required or not | Parameter default | Parameter range |
|---|---|---|---|---|---|
| src_field | The field name to be extracted | String | Yes | - | - |
| fields_info | The target field name after matching. | List<Table> | Yes | - | - |
| sep | Separator, not limited to a single character. | String | No | \t | - |
| quote | Quote character, used to wrap the value. | String | No | - | - |
| restrict | Default value: false When the number of extracted values is inconsistent with the number of target fields input by the user: true: ignore, no extraction processing is performed; false: try to match the first few fields | String | No | false | true/false |
| mode | Field overwriting mode. The default is fill-auto | String | No | fill-auto | fill/fill-auto/add/add-auto/overwrite/overwrite-auto |
Example
- Example 1
Original log:
{"content": "a1\tb1"} Processing rules:
e_tsv('content', [{'a':'string'}, {'b':'string'}])Processing results:
{"content": "a1\tb1", "a": "a1", "b", "b1"} - Example 2
Original log:
{"content": "a1\tb1"}Processing rules:
e_tsv('k1', [{'a':'string'}])Processing results:
{"content": "a1\tb1", "a": "a1"} - Example 3
Original log:
{"content": "a1\tb1"}Processing rules:
e_tsv('k1', [{'a':'string'}, {'b':'string'}, {'c':'string'}])Processing results:
{"content": "a1\tb1", "a": "a1", "b", "b1"} e_kv function
Function definition
Extract field values based on two-level separators.
Syntax description
e_kv(src_field, reg, keyIndex, valueIndex, fields_info=None, mode="fill-auto")Parameter description
| Parameter name | Parameter description | Parameter type | Required or not | Parameter default | Parameter range |
|---|---|---|---|---|---|
| src_field | The field name to be extracted | String | Yes | - | - |
| reg | The separator string of the regular expression for keywords and values | String | Yes | - | - |
| keyIndex | The subscript of the key, indicating which one of the regular expression matching results the key takes | Int | Yes | - | - |
| valueIndex | The subscript of the value, indicating which one of the regular expression matching results the value takes | Int | Yes | - | - |
| fields_info | The target field name after matching. | List<Table> | No | - | - |
| mode | Field overwriting mode. The default is fill-auto | String | No | fill-auto | fill/fill-auto/add/add-auto/overwrite/overwrite-auto |
Example
- Example 1
Original log:
{"content": "a:a1, b:b1"} Processing rules:
e_kv('content', '([a-z]+):([a-z0-9]+)', 1, 2, [{'a':'string'}, {'b':'string'}])Processing results:
{"content": "a:a1, b:b1", "a": "a1", "b", "b1"}