百度智能云

All Product Document

          Data Warehouse

          CREATE-TABLE

          CREATE TABLE

          Description

          This command is used to create a table. This article mainly introduces the syntax of creating tables self-maintained by Palo. For syntax of external tables, refer to [CREATE EXTERNAL TABLE](PALO/Operating Manual/Data Load /Synchronize Data Through External Table.md#) Document.

          CREATE TABLE [IF NOT EXISTS] [database.]table
          (
              column_definition_list,
              [index_definition_list]
          )
          [engine_type]
          [keys_type]
          [table_comment]
          [partition_info]
          distribution_info
          [rollup_list]
          [properties]
          [extra_properties]
          • column_definition_list

            Column definition list:

            column_definition[, column_definition]

            • column_definition

              Column definition:

              column_name column_type [KEY] [aggr_type] [NULL] [default_value] [column_comment]

              • column_type

                Column type, the following types are supported:

                TINYINT(1 byte)
                   Range: -2^7 + 1 ~ 2^7 - 1
                SMALLINT(2 bytes)
                    Range: -2^15 + 1 ~ 2^15 - 1
                INT(4 bytes)
                    Range: -2^31 + 1 ~ 2^31 - 1
                BIGINT(8 bytes)
                   Range: -2^63 + 1 ~ 2^63 - 1
                LARGEINT(16 bytes)
                    Range: -2^127 + 1 ~ 2^127 - 1
                FLOAT(4 bytes)
                    support scientific counting method
                DOUBLE(12 bytes)
                    support scientific counting method
                DECIMAL[(precision, scale)] (16 bytes)
                    guarantee the decimal type of precision, default to DECIMAL(10, 0)
                    precision: 1 ~ 27
                    scale: 0 ~ 9
                    In which, the integer part is 1 ~ 18
                    not support the scientific counting method
                DATE(3 bytes)
                    Range: 0000-01-01 ~ 9999-12-31
                DATETIME(8 bytes)
                    Range:0000-01-01 00:00:00 ~ 9999-12-31 23:59:59
                CHAR[(length)]
                    fixed-length string. Range of length: 1 ~ 255. Default to 1
                VARCHAR[(length)]
                    Variable-length string. Range of length: 1 ~ 65533. Default to 1
                HLL (1~16385 bytes)
                    HyperLogLog column type,there is no need to specify the length and default value.The length is controlled in the system according to the aggregation degree of data.
                   Must be used in combination with HLL_UNION aggregation type.
                BITMAP
                    bitmap column type,there is no need to specify the length and default value.It represents the sets of integers, elements support up to 2^64 - 1.
                    Must be used in combination with BITMAP_UNION aggregation type.
              • aggr_type

                Aggregation type, the following types are supported:

                SUM:sum.Applicable to numeric type.
                MIN:minimum value. Applicable to numeric type.
                MAX:maximum value. Applicable to numeric type.
                REPLACE:replace. For rows with the same dimension column, the index columns will be imported in the order of import, and replace the one imported previously with the one imported later.
                REPLACE_IF_NOT_NULL:replace if not null. The difference between it with REPLACE lies in only replacing values not null.
                HLL_UNION:aggregation method for columns of HLL type, aggregate by HyperLogLog algorithm.
                BITMAP_UNION:aggregation method for columns of BIMTAP type, aggregate the union set of bitmaps. 

              Example:

                ​```
                k1 TINYINT,
                k2 DECIMAL(10,2) DEFAULT "10.5",
                k4 BIGINT NULL DEFAULT VALUE "1000" COMMENT "This is column k4",
                v1 VARCHAR(10) REPLACE NOT NULL,
                v2 BITMAP BITMAP_UNION,
                v3 HLL HLL_UNION,
                v4 INT SUM NOT NULL DEFAULT "1" COMMENT "This is column v4"
                ​```
          • index_definition_list

            Index list definition:

            index_definition[, index_definition]

            • index_definition

              Index definition:

              INDEX index_name (col_name[, col_name, ...]) [USING BITMAP] COMMENT 'xxxxxx'

              Example:

              INDEX idx1 (k1, k2) USING BITMAP COMMENT "This is a bitmap index"
          • engine_type

            Table engine type. All types in this document are OLAP. For other external table engine types, refer to CREATE EXTERNAL TABLE document. Example:

            ENGINE=olap

          • key_desc

            Data model.

            key_type(col1, col2, ...)

            key_type supports the following models:

            • DUPLICATE KEY(default): columns specified after that are ordering columns.
            • AGGREGATE KEY:the columns specified after that are dimensional columns.
            • UNIQUE KEY:the columns specified after that are key columns.

            Example:

            DUPLICATE KEY(col1, col2),
            AGGREGATE KEY(k1, k2, k3),
            UNIQUE KEY(k1, k2)
          • table_comment

            Table comment. Example:

            COMMENT "This is my first PALO table"
          • partition_desc

            Partition information, support two kinds of writing:

            1. LESS THAN:only define the upper bound of partition. Lower bound is determined by the upper bound of last partition.

              PARTITION BY RANGE(col1[, col2, ...])
              (
                  PARTITION partition_name1 VALUES LESS THAN MAXVALUE|("value1", "value2", ...),
                  PARTITION partition_name2 VALUES LESS THAN MAXVALUE|("value1", "value2", ...)
              )
            2. FIXED RANGE: define the left-closed and right open interval of partition.

              PARTITION BY RANGE(col1[, col2, ...])
              (
                  PARTITION partition_name1 VALUES [("k1-lower1", "k2-lower1", "k3-lower1",...), ("k1-upper1", "k2-upper1", "k3-upper1", ...)),
                  PARTITION partition_name2 VALUES [("k1-lower1-2", "k2-lower1-2", ...), ("k1-upper1-2", MAXVALUE, ))
              )
          • distribution_desc

            Define the data bucketing method.

            DISTRIBUTED BY HASH (k1[,k2 ...]) [BUCKETS num]

          • rollup_list

            Multiple materialized views (ROLLUP) can be created while creating tables.

            ROLLUP (rollup_definition[, rollup_definition, ...])

            • rollup_definition

              rollup_name (col1[, col2, ...]) [DUPLICATE KEY(col1[, col2, ...])] [PROPERTIES("key" = "value")]

              Example:

              ROLLUP (
                  r1 (k1, k3, v1, v2),
                  r2 (k1, v1)
              )
          • properties

            Set table properties. Currently the following properties are supported:

            • replication_num

              Replication number. The default replication number is 3. If the number of BE node is less than 3, it is required to specify the replication number less than or equal to the number of BE node.

            • storage_medium/storage_cooldown_time

              Data storage medium.storage_medium is used to declare the initial storage medium of table data, while storage_cooldown_time is used to set the expiration time. Example:

              "storage_medium" = "SSD",
              "storage_cooldown_time" = "2020-11-20 00:00:00"

              This example indicates that the data is stored in SSD and will be automatically migrated to HDD storage after it expires at 00: 00: 00 on 2020-11-20.

            • colocate_with

              When it needs to use Colocation Join function, use this parameter to set Colocation Group.

              "colocate_with" = "group1"

            • bloom_filter_columns

              Users specify the list of column names to be added with Bloom Filter index. The Bloom Filter index of each column is independent, not combined.

              "bloom_filter_columns" = "k1, k2, k3"

            • in_memory

              Set whether the table is Memory Table through this property.

              "in_memory" = "true"

            • function_column.sequence_type

              When using the UNIQUE KEY model, a sequence column can be specified. When the KEY columns are the same, REPLACE will be done by the sequence column (use the larger value to replace the smaller value, otherwise the replacement cannot be executed)

              Here we only need to specify the type of sequence columns, time type or integer type is supported. Palo will create a hidden sequence column.

              "function_column.sequence_type" = 'Date'

            • Dynamic partition related

              Parameters related to dynamic partition are as follows:

              • dynamic_partition.enable: used to specify whether the dynamic partitioning function at table level is turned on. Default to true。
              • dynamic_partition.time_unit: Used to specify the time unit for dynamically adding partitions, which can be selected as DAY, WEEK and MONTH
              • dynamic_partition.start: Used to specify how many partitions are deleted forward. The value must be less than 0. Default to Integer.MIN_VALUE。
              • dynamic_partition.end: Used to specify the number of partitions created in advance. The value must be greater than 0.
              • dynamic_partition.prefix: Used to specify the prefix of the created partition name. for example, if the prefix of the partition name is p, the partition name will be automatically created as p20200108
              • dynamic_partition.buckets: Used to specify the number of partition and buckets automatically created

          Example

          1. Create a detailed model table

            CREATE TABLE example_db.table_hash
            (
                k1 TINYINT,
                k2 DECIMAL(10, 2) DEFAULT "10.5",
                k3 CHAR(10) COMMENT "string column",
                k4 INT NOT NULL DEFAULT "1" COMMENT "int column"
            )
            COMMENT "my first table"
            DISTRIBUTED BY HASH(k1) BUCKETS 32
          2. Create a detailed model table, partition and specify the ordering column, set replication number as 1

            CREATE TABLE example_db.table_hash
            (
                k1 DATE,
                k2 DECIMAL(10, 2) DEFAULT "10.5",
                k3 CHAR(10) COMMENT "string column",
                k4 INT NOT NULL DEFAULT "1" COMMENT "int column"
            )
            DUPLICATE KEY(k1, k2)
            COMMENT "my first table"
            PARTITION BY RANGE(k1)
            (
                PARTITION p1 VALUES LESS THAN ("2020-02-01"),
                PARTITION p1 VALUES LESS THAN ("2020-03-01"),
                PARTITION p1 VALUES LESS THAN ("2020-04-01")
            )
            DISTRIBUTED BY HASH(k1) BUCKETS 32
            PROPERTIES (
                "replication_num" = "1"
            );
          3. Create a unique key model table, set the initial storage medium and cooldown time

            CREATE TABLE example_db.table_hash
            (
                k1 BIGINT,
                k2 LARGEINT,
                v1 VARCHAR(2048) REPLACE,
                v2 SMALLINT SUM DEFAULT "10"
            )
            UNIQUE KEY(k1, k2)
            DISTRIBUTED BY HASH (k1, k2) BUCKETS 32
            PROPERTIES(
                "storage_medium" = "SSD",
                "storage_cooldown_time" = "2015-06-04 00:00:00"
            );
          4. Create an aggregate model table and describe by partition with fixed zone

            CREATE TABLE table_range
            (
                k1 DATE,
                k2 INT,
                k3 SMALLINT,
                v1 VARCHAR(2048) REPLACE,
                v2 INT SUM DEFAULT "1"
            )
            AGGREGATE KEY(k1, k2, k3)
            PARTITION BY RANGE (k1, k2, k3)
            (
                PARTITION p1 VALUES [("2014-01-01", "10", "200"), ("2014-01-01", "20", "300")),
                PARTITION p2 VALUES [("2014-06-01", "100", "200"), ("2014-07-01", "100", "300"))
            )
            DISTRIBUTED BY HASH(k2) BUCKETS 32
          5. Create an aggregate model table containing HLL and BITMAP column types

            CREATE TABLE example_db.example_table
            (
                k1 TINYINT,
                k2 DECIMAL(10, 2) DEFAULT "10.5",
                v1 HLL HLL_UNION,
                v2 BITMAP BITMAP_UNION
            )
            ENGINE=olap
            AGGREGATE KEY(k1, k2)
            DISTRIBUTED BY HASH(k1) BUCKETS 32
          6. Create two tables self-maintained by same Colocation Group.

            CREATE TABLE t1 (
                id int(11) COMMENT "",
                value varchar(8) COMMENT ""
            )
            DUPLICATE KEY(id)
            DISTRIBUTED BY HASH(id) BUCKETS 10
            PROPERTIES (
                "colocate_with" = "group1"
            );
            
            CREATE TABLE t2 (
                id int(11) COMMENT "",
                value1 varchar(8) COMMENT "",
                value2 varchar(8) COMMENT ""
            )
            DUPLICATE KEY(`id`)
            DISTRIBUTED BY HASH(`id`) BUCKETS 10
            PROPERTIES (
                "colocate_with" = "group1"
            );
          7. Create a memory table with bitmap index and bloom filter index

            CREATE TABLE example_db.table_hash
            (
                k1 TINYINT,
                k2 DECIMAL(10, 2) DEFAULT "10.5",
                v1 CHAR(10) REPLACE,
                v2 INT SUM,
                INDEX k1_idx (k1) USING BITMAP COMMENT 'my first index'
            )
            AGGREGATE KEY(k1, k2)
            DISTRIBUTED BY HASH(k1) BUCKETS 32
            PROPERTIES (
                "bloom_filter_columns" = "k2",
                "in_memory" = "true"
            );
          8. Create a dynamic partition table.

            Create partitions for three days in the table in advance every day, and delete partitions three days before. For example, today is 2020-01-08, then the partitions named as p20200108, p20200109, p20200110, p20200111 will be created. The ranges of such partitions are respectively:

            [types: [DATE]; keys: [2020-01-08]; ‥types: [DATE]; keys: [2020-01-09]; )
            [types: [DATE]; keys: [2020-01-09]; ‥types: [DATE]; keys: [2020-01-10]; )
            [types: [DATE]; keys: [2020-01-10]; ‥types: [DATE]; keys: [2020-01-11]; )
            [types: [DATE]; keys: [2020-01-11]; ‥types: [DATE]; keys: [2020-01-12]; )
            CREATE TABLE example_db.dynamic_partition
            (
                k1 DATE,
                k2 INT,
                k3 SMALLINT,
                v1 VARCHAR(2048),
                v2 DATETIME DEFAULT "2014-02-04 15:36:00"
            )
            DUPLICATE KEY(k1, k2, k3)
            PARTITION BY RANGE (k1) ()
            DISTRIBUTED BY HASH(k2) BUCKETS 32
            PROPERTIES(
                "dynamic_partition.time_unit" = "DAY",
                "dynamic_partition.start" = "-3",
                "dynamic_partition.end" = "3",
                "dynamic_partition.prefix" = "p",
                "dynamic_partition.buckets" = "32" 
            );
          9. Create a table with materialized view (ROLLUP).

            CREATE TABLE example_db.rolup_index_table
            (
                event_day DATE,
                siteid INT DEFAULT '10',
                citycode SMALLINT,
                username VARCHAR(32) DEFAULT '',
                pv BIGINT SUM DEFAULT '0'
            )
            AGGREGATE KEY(event_day, siteid, citycode, username)
            DISTRIBUTED BY HASH(siteid) BUCKETS 10
            ROLLUP (
                r1(event_day,siteid),
                r2(event_day,citycode),
                r3(event_day)
            )
            PROPERTIES("replication_num" = "3");

          Keyword

          CREATE, TABLE

          Best Practices

          Partition and bucketing

          In a table, a bucketing column must be specified, and a partition may not. Detailed introduction of partition and bucketing can be found in Data Division document.

          Tables in Palo can be divided into partitioned tables and non-partitioned tables. This property is determined when the table is created, and cannot be changed afterwards. For partitioned tables, operations such as addition, deletion can be done during the process of use, while for non-partitioned tables, such operations are not allowed.

          At the same time, the partition columns and bucketing columns cannot be changed after the table is created, neither the type of partition column and bucketing column can be changed, nor any addition or deletion operation can be performed on these columns.

          So it is suggested to confirm the usage mode before building the table.

          Dynamic Partition

          The dynamic partition function is mainly used to help users manage partitions automatically. After set with related rules, Palo system regularly adds new partitions or deletes historical partitions. For more help, refer to Dynamic Partition document.

          Materialized Views

          Users can create multiple materialized views (ROLLUP) while building tables. Materialized views can also be added after the table is created. It can facilitate users to create all materialized views at one time by writing it in table building statements.

          If materialized views are created during table building process, all subsequent data import operations will generate materialized view data synchronously. The number of materialized views may affect the efficiency of data import.

          If materialized views are added during later use, if there is data in the table, the creation time of materialized views depends on the current data amount.

          Introduction to materialized views can be found in Materialized Views.

          Index

          Users can create indexes of multiple columns while building tables. Indexes can also be added after the table is created.

          When adding index in the later use, if there is data existing in the table, it is required to rewrite all the data, and the index creation time will be determined by the current data volume.

          Memory Table

          When the "in_memory" = "true" property is specified when building table, Palo will cache the data blocks of this table in the PageCache of the storage engine as far as possible to reduce the disk IO. However, this property cannot guarantee the data block is stored in the memory all the time but only works as an indicator of best-effort.

          Previous
          CREATE-FILE
          Next
          CREATE-REPOSITORY