百度智能云

All Product Document

          Log Service

          Field value extraction functions

          Field value extraction functions

          Introduction

          A common use case of key-value extraction functions is shown in the following figure. After processing into structured data, it can be further used in SQL analysis scenarios.

          e_regex function

          Function definition

          Get the field value and return the corresponding string.

          Syntax description

          e_regex(field, regex, fields_info=None, mode="fill-auto", pack_json='')

          Parameter description

          Parameter name Parameter description Parameter type Required or not Parameter default Parameter range
          field The field name to be extracted String Yes - -
          regex Regular expression String Yes - -
          fields_info The target field name after matching. This parameter must be configured when the regular expression parameter does not configure the name of the named capture. List<Table> No - -
          mode Field overwriting mode. The default is fill-auto String No overwrite fill/fill-auto/add/add-auto/overwrite/overwrite-auto
          pack_json Pack all matching results of the regular expression into the field specified by pack_json. The default value is empty, indicating no packing. String No - -

          Example

          • Example 1

          Original log:

          {"content": "1234abcd5678"}  

          Processing rules:

          e_regex("content", "\d+", [{'target1':'long'}])  

          Processing results:

          {"content": "1234abcd5678", "target1": 1234}
          • Example 2

          Original log:

          {"content": "1234abcd"}  

          Processing rules:

          e_regex("content", "(?<target1>\d+)(.*)", [{'target2':'string'}])  

          Processing results:

          {"content": "1234abcd5678", "target1": "1234", "target2": abcd}
          • Example 3

          Original log:

          {"content": "1234abcd5678"}  

          Processing rules:

          e_regex("content", "\d+", [{'target1':'long'}, {'target2':'long'}])  

          Processing results:

          {"content": "1234abcd5678", "target1": 1234, "target2": 5678}
          • Example 4

          Original log:

          {"content": "1234abcd5678"}  

          Processing rules:

          e_regex("content", "\d+", [{'target1':'long'}, {'target2':'long'}], pack_json='new')  

          Processing results:

          {"content": "1234abcd5678", "new": {"target1": 1234, "target2": 5678}}

          e_json function

          Function definition

          Extract field values from JSON.

          Syntax description

          e_json(field, depth=100, prefix="", suffix="", fmt="simple", sep=".", mode="fill-auto")

          Parameter description

          Parameter name Parameter description Parameter type Required or not Parameter default Parameter range
          field The field name to be extracted String Yes - -
          depth The depth of field expansion. The value range is 1-2000, 1 means only expanding the first layer, the default is 100 layers Int No 100 1~2000
          prefix The prefix added to the field name when expanding. String No - -
          suffix The suffix added to the field name when expanding. String No - -
          fmt Formatting method String No simple -
          sep The separator for formatting parent-child nodes. It needs to be set when fmt is full, parent, or root. Default to. String No - simple (default value): indicates using the node name as the field name; full: indicates combining the parent node and the current node as the field name; parent: indicates using the complete path as the field name; root: indicates combining the root node and the current node as the field name
          mode Field overwriting mode. The default is fill-auto String No fill-auto fill/fill-auto/add/add-auto/overwrite/overwrite-auto

          Example

          • Example 1

          Original log:

          {"content": "{\"a\": \"a1\", \"b\": \"b1\"}"}  

          Processing rules:

          e_json("content")  

          Processing results:

          {"content": "{\"a\": \"a1\", \"b\": \"b1\"}", "a": "a1", "b", "b1"}  
          • Example 2

          Original log:

          {"content": "{\"a\": \"a1\", \"b\": \"b1\"}"}  

          Processing rules:

          e_json("content", prefix="_", suffix="__")  

          Processing results:

          {"content": "{\"a\": \"a1\", \"b\": \"b1\"}", "_a__": "a1", "_b__", "b1"}  

          e_sep function

          Function definition

          Extract field value content based on specified characters (multiple characters).

          Syntax description

          e_kv(src_field, fields_info, sep=" ", quote="", restrict=false, mode="fill-auto")

          Parameter description

          Parameter name Parameter description Parameter type Required or not Parameter default Parameter range
          src_field The field name to be extracted String Yes - -
          fields_info The target field name after matching. List<Table> Yes - -
          sep Separator, not limited to a single character. String No Space -
          quote Quote character, used to wrap the value. String No - -
          restrict Default value: false When the number of extracted values is inconsistent with the number of target fields input by the user: true: ignore, no extraction processing is performed; false: try to match the first few fields String No false true/false
          mode Field overwriting mode. The default is fill-auto String No fill-auto fill/fill-auto/add/add-auto/overwrite/overwrite-auto

          Example

          • Example 1

          Original log:

          {"content": "a1 b1"}  

          Processing rules:

          e_sep('content', [{'a':'string'}, {'b':'string'}])

          Processing results:

          {"content": "a1 b1", "a": "a1", "b", "b1"}  
          • Example 2

          Original log:

          {"content": "a1 b1"}

          Processing rules:

          e_sep('k1', [{'a':'string'}])

          Processing results:

          {"content": "a1 b1", "a": "a1"}  
          • Example 3

          Original log:

          {"content": "a1 b1"}

          Processing rules:

          e_sep('k1', [{'a':'string'}, {'b':'string'}, {'c':'string'}])

          Processing results:

          {"content": "a1 b1", "a": "a1", "b", "b1"}  

          e_csv function

          Function definition

          Extract field value content based on specified characters (multiple characters).

          Syntax description

          e_csv(src_field, fields_info, sep=",", quote="", restrict=false, mode="fill-auto")

          Parameter description

          Parameter name Parameter description Parameter type Required or not Parameter default Parameter range
          src_field The field name to be extracted String Yes - -
          fields_info The target field name after matching. List<Table> Yes - -
          sep Separator, not limited to a single character. String No , -
          quote Quote character, used to wrap the value. String No - -
          restrict Default value: false When the number of extracted values is inconsistent with the number of target fields input by the user: true: ignore, no extraction processing is performed; false: try to match the first few fields String No false true/false
          mode Field overwriting mode. The default is fill-auto String No fill-auto fill/fill-auto/add/add-auto/overwrite/overwrite-auto

          Example

          • Example 1

          Original log:

          {"content": "a1,b1"}  

          Processing rules:

          e_csv('content', [{'a':'string'}, {'b':'string'}])

          Processing results:

          {"content": "a1,b1", "a": "a1", "b", "b1"}  
          • Example 2

          Original log:

          {"content": "a1,b1"}

          Processing rules:

          e_csv('k1', [{'a':'string'}])

          Processing results:

          {"content": "a1,b1", "a": "a1"}  
          • Example 3

          Original log:

          {"content": "a1,b1"}

          Processing rules:

          e_csv('k1', [{'a':'string'}, {'b':'string'}, {'c':'string'}])

          Processing results:

          {"content": "a1,b1", "a": "a1", "b", "b1"}  

          e_psv function

          Function definition

          Extract field value content based on specified characters (multiple characters).

          Syntax description

          e_psv(src_field, fields_info, sep="|", quote="", restrict=false, mode="fill-auto")

          Parameter description

          Parameter name Parameter description Parameter type Required or not Parameter default Parameter range
          src_field The field name to be extracted String Yes - -
          fields_info The target field name after matching. List<Table> Yes - -
          sep Separator, not limited to a single character. String No | -
          quote Quote character, used to wrap the value. String No - -
          restrict Default value: false When the number of extracted values is inconsistent with the number of target fields input by the user: true: ignore, no extraction processing is performed; false: try to match the first few fields String No false true/false
          mode Field overwriting mode. The default is fill-auto String No fill-auto fill/fill-auto/add/add-auto/overwrite/overwrite-auto

          Example

          • Example 1

          Original log:

          {"content": "a1|b1"}  

          Processing rules:

          e_psv('content', [{'a':'string'}, {'b':'string'}])

          Processing results:

          {"content": "a1|b1", "a": "a1", "b", "b1"}  
          • Example 2

          Original log:

          {"content": "a1|b1"}

          Processing rules:

          e_psv('k1', [{'a':'string'}])

          Processing results:

          {"content": "a1|b1", "a": "a1"}  
          • Example 3

          Original log:

          {"content": "a1|b1"}

          Processing rules:

          e_psv('k1', [{'a':'string'}, {'b':'string'}, {'c':'string'}])

          Processing results:

          {"content": "a1|b1", "a": "a1", "b", "b1"}  

          e_tsv function

          Function definition

          Extract field value content based on specified characters (multiple characters).

          Syntax description

          e_tsv(src_field, fields_info, sep="\t", quote="", restrict=false, mode="fill-auto")

          Parameter description

          Parameter name Parameter description Parameter type Required or not Parameter default Parameter range
          src_field The field name to be extracted String Yes - -
          fields_info The target field name after matching. List<Table> Yes - -
          sep Separator, not limited to a single character. String No \t -
          quote Quote character, used to wrap the value. String No - -
          restrict Default value: false When the number of extracted values is inconsistent with the number of target fields input by the user: true: ignore, no extraction processing is performed; false: try to match the first few fields String No false true/false
          mode Field overwriting mode. The default is fill-auto String No fill-auto fill/fill-auto/add/add-auto/overwrite/overwrite-auto

          Example

          • Example 1

          Original log:

          {"content": "a1\tb1"}  

          Processing rules:

          e_tsv('content', [{'a':'string'}, {'b':'string'}])

          Processing results:

          {"content": "a1\tb1", "a": "a1", "b", "b1"}  
          • Example 2

          Original log:

          {"content": "a1\tb1"}

          Processing rules:

          e_tsv('k1', [{'a':'string'}])

          Processing results:

          {"content": "a1\tb1", "a": "a1"}  
          • Example 3

          Original log:

          {"content": "a1\tb1"}

          Processing rules:

          e_tsv('k1', [{'a':'string'}, {'b':'string'}, {'c':'string'}])

          Processing results:

          {"content": "a1\tb1", "a": "a1", "b", "b1"}  

          e_kv function

          Function definition

          Extract field values based on two-level separators.

          Syntax description

          e_kv(src_field, reg, keyIndex, valueIndex, fields_info=None, mode="fill-auto")

          Parameter description

          Parameter name Parameter description Parameter type Required or not Parameter default Parameter range
          src_field The field name to be extracted String Yes - -
          reg The separator string of the regular expression for keywords and values String Yes - -
          keyIndex The subscript of the key, indicating which one of the regular expression matching results the key takes Int Yes - -
          valueIndex The subscript of the value, indicating which one of the regular expression matching results the value takes Int Yes - -
          fields_info The target field name after matching. List<Table> No - -
          mode Field overwriting mode. The default is fill-auto String No fill-auto fill/fill-auto/add/add-auto/overwrite/overwrite-auto

          Example

          • Example 1

          Original log:

          {"content": "a:a1, b:b1"}  

          Processing rules:

          e_kv('content', '([a-z]+):([a-z0-9]+)', 1, 2, [{'a':'string'}, {'b':'string'}])

          Processing results:

          {"content": "a:a1, b:b1", "a": "a1", "b", "b1"}
          Previous
          Mapping enrichment functions
          Next
          Process control function