使用 Leading Hint 控制 Join 顺序
概述
Leading Hint 特性允许用户手工指定查询中的表的连接顺序,在特定场景优化复杂查询性能。本文将详细介绍如何在 PALO 中使用 Leading Hint 来控制 join 的顺序。
注意: 当前 PALO 已经具备良好的开箱即用的能力,也就意味着在绝大多数场景下,PALO 会自适应的优化各种场景下的性能,无需用户来手工控制 hint 来进行业务调优。本章介绍的内容主要面向专业调优人员,业务人员仅做简单了解即可。
案例 1:调整左右表顺序
对于如下查询:
1mysql> explain shape plan select from t1 join t2 on t1.c1 = t2.c2;
2+------------------------------------------------------------------------------+
3| _Explain_ String(Nereids Planner)                                              |
4+------------------------------------------------------------------------------+
5| PhysicalResultSink                                                           |
6| --PhysicalDistribute[DistributionSpecGather]                                 |
7| ----PhysicalProject                                                          |
8| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
9| --------PhysicalOlapScan[t1]                                                 |
10| --------PhysicalDistribute[DistributionSpecHash]                             |
11| ----------PhysicalOlapScan[t2]                                               |
12+------------------------------------------------------------------------------+可以使用 Leading Hint,强制指定 join order 为 t2 join t1,调整原始连接顺序。
1mysql> explain shape plan select /*+ leading(t2 t1) */ * from t1 join t2 on t1.c1 = t2.c2;
2+------------------------------------------------------------------------------+
3| _Explain_ String(Nereids Planner)                                              |
4+------------------------------------------------------------------------------+
5| PhysicalResultSink                                                           |
6| --PhysicalDistribute[DistributionSpecGather]                                 |
7| ----PhysicalProject                                                          |
8| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
9| --------PhysicalOlapScan[t2]                                                 |
10| --------PhysicalDistribute[DistributionSpecHash]                             |
11| ----------PhysicalOlapScan[t1]                                               |
12|                                                                              |
13| Hint log:                                                                    |
14| Used: leading(t2 t1)                                                         |
15| UnUsed:                                                                      |
16| SyntaxError:                                                                 |
17+------------------------------------------------------------------------------+Hint log 展示了应用成功的 hint: Used: leading(t2 t1)。
案例 2:强制生成左深树
1mysql> explain shape plan select /*+ leading(t1 t2 t3) */ * from t1 join t2 on t1.c1 = t2.c2 join t3 on t2.c2 = t3.c3;
2+--------------------------------------------------------------------------------+
3| _Explain_ String(Nereids Planner)                                                |
4+--------------------------------------------------------------------------------+
5| PhysicalResultSink                                                             |
6| --PhysicalDistribute[DistributionSpecGather]                                   |
7| ----PhysicalProject                                                            |
8| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=()   |
9| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
10| ----------PhysicalOlapScan[t1]                                                 |
11| ----------PhysicalDistribute[DistributionSpecHash]                             |
12| ------------PhysicalOlapScan[t2]                                               |
13| --------PhysicalDistribute[DistributionSpecHash]                               |
14| ----------PhysicalOlapScan[t3]                                                 |
15|                                                                                |
16| Hint log:                                                                      |
17| Used: leading(t1 t2 t3)                                                        |
18| UnUsed:                                                                        |
19| SyntaxError:                                                                   |
20+--------------------------------------------------------------------------------+同样,Hint log 展示了应用成功的 hint: Used: leading(t1 t2 t3)。
案例 3:强制生成右深树
1mysql> explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 join t2 on t1.c1 = t2.c2 join t3 on t2.c2 = t3.c3;
2+----------------------------------------------------------------------------------+
3| _Explain_ String(Nereids Planner)                                                  |
4+----------------------------------------------------------------------------------+
5| PhysicalResultSink                                                               |
6| --PhysicalDistribute[DistributionSpecGather]                                     |
7| ----PhysicalProject                                                              |
8| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=()     |
9| --------PhysicalOlapScan[t1]                                                     |
10| --------PhysicalDistribute[DistributionSpecHash]                                 |
11| ----------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
12| ------------PhysicalOlapScan[t2]                                                 |
13| ------------PhysicalDistribute[DistributionSpecHash]                             |
14| --------------PhysicalOlapScan[t3]                                               |
15|                                                                                  |
16| Hint log:                                                                        |
17| Used: leading(t1 { t2 t3 })                                                      |
18| UnUsed:                                                                          |
19| SyntaxError:                                                                     |
20+----------------------------------------------------------------------------------+同样,Hint log 展示了应用成功的 hint: Used: leading(t1 { t2 t3 })。
案例 4:强制生成 bushy 树
1mysql> explain shape plan select /*+ leading({t1 t2} {t3 t4}) */ * from t1 join t2 on t1.c1 = t2.c2 join t3 on t2.c2 = t3.c3 join t4 on t3.c3 = t4.c4;
2+-----------------------------------------------+
3| _Explain_ String                                |
4+-----------------------------------------------+
5| PhysicalResultSink                            |
6| --PhysicalDistribute                          |
7| ----PhysicalProject                           |
8| ------hashJoin[INNER_JOIN](t2.c2 = t3.c3)     |
9| --------hashJoin[INNER_JOIN](t1.c1 = t2.c2)   |
10| ----------PhysicalOlapScan[t1]                |
11| ----------PhysicalDistribute                  |
12| ------------PhysicalOlapScan[t2]              |
13| --------PhysicalDistribute                    |
14| ----------hashJoin[INNER_JOIN](t3.c3 = t4.c4) |
15| ------------PhysicalOlapScan[t3]              |
16| ------------PhysicalDistribute                |
17| --------------PhysicalOlapScan[t4]            |
18|                                               |
19| Used: leading({ t1 t2 } { t3 t4 })            |
20| UnUsed:                                       |
21| SyntaxError:                                  |
22+-----------------------------------------------+同样,Hint log 展示了应用成功的 hint: Used: leading({ t1 t2 } { t3 t4 })。
案例 5:view 作为整体参与连接
1mysql>  explain shape plan select /*+ leading(alias t1) */ count(*) from t1 join (select c2 from t2 join t3 on t2.c2 = t3.c3) as alias on t1.c1 = alias.c2;
2+--------------------------------------------------------------------------------------+
3| _Explain_ String(Nereids Planner)                                                      |
4+--------------------------------------------------------------------------------------+
5| PhysicalResultSink                                                                   |
6| --hashAgg[GLOBAL]                                                                    |
7| ----PhysicalDistribute[DistributionSpecGather]                                       |
8| ------hashAgg[LOCAL]                                                                 |
9| --------PhysicalProject                                                              |
10| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = alias.c2)) otherCondition=()  |
11| ------------PhysicalProject                                                          |
12| --------------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
13| ----------------PhysicalProject                                                      |
14| ------------------PhysicalOlapScan[t2]                                               |
15| ----------------PhysicalDistribute[DistributionSpecHash]                             |
16| ------------------PhysicalProject                                                    |
17| --------------------PhysicalOlapScan[t3]                                             |
18| ------------PhysicalDistribute[DistributionSpecHash]                                 |
19| --------------PhysicalProject                                                        |
20| ----------------PhysicalOlapScan[t1]                                                 |
21|                                                                                      |
22| Hint log:                                                                            |
23| Used: leading(alias t1)                                                              |
24| UnUsed:                                                                              |
25| SyntaxError:                                                                         |
26+--------------------------------------------------------------------------------------+同样,Hint log 展示了应用成功的 hint: Used: leading(alias t1)。
案例 6:DistributeHint 与 LeadingHint 混用
1explain shape plan
2    select 
3        nation,
4        o_year,
5        sum(amount) as sum_profit
6    from
7        (
8            select
9                /*+ leading(orders shuffle {lineitem shuffle part} shuffle {supplier broadcast nation} shuffle partsupp) */
10                n_name as nation,
11                extract(year from o_orderdate) as o_year,
12                l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
13            from
14                part,
15                supplier,
16                lineitem,
17                partsupp,
18                orders,
19                nation
20            where
21                s_suppkey = l_suppkey
22                and ps_suppkey = l_suppkey
23                and ps_partkey = l_partkey
24                and p_partkey = l_partkey
25                and o_orderkey = l_orderkey
26                and s_nationkey = n_nationkey
27                and p_name like '%green%'
28        ) as profit
29    group by
30        nation,
31        o_year
32    order by
33        nation,
34        o_year desc;上述 /*+ leading(orders shuffle {lineitem shuffle part} shuffle {supplier broadcast nation} shuffle partsupp) */ hint 指定方式,混用了 leading 和 distribute hint 两种格式。leading 用于控制总体的表之间的相对 join 顺序,而 shuffle 和 broadcast 分别用于指定特定 join 使用何种 shuffle 方式。通过两种结合使用,可以灵活的控制连接顺序和连接方式,便于手工控制用户期望的计划行为。
使用建议:
- 建议使用 EXPLAIN 来仔细分析执行计划,以确保 Leading Hint 能达到预期的效果。
- PALO 版本升级或者业务数据变更时,应重新评估 Leading Hint 的效果,做到及时记录和调整。
总结
Leading Hint 是一种强大的可以手工控制连接顺序的功能,于此同时,也可以和 shuffle hint 结合使用,同时控制 join 分发方式,进而优化查询性能。注意这种高级特性,应当在充分理解查询特性及数据分布的基础上谨慎使用。
