【StoneDB子查询优化】subquery子查询-优化exists场景的join查询优化器的处理-
摘要:
说明如何优化exists的join查询优化器的处理
核心函数:
TwoDimensionalJoiner::ChooseJoinAlgorithm
JoinAlgType TwoDimensionalJoiner::ChooseJoinAlgorithm([[maybe_unused]] MultiIndex &mind, Condition &cond) {
JoinAlgType join_alg = JoinAlgType::JTYPE_GENERAL;
if (cond[0].IsType_JoinSimple() && cond[0].op == common::Operator::O_EQ) {
if ((cond.Size() == 1) && !stonedb_sysvar_force_hashjoin)
join_alg = JoinAlgType::JTYPE_MAP; // available types checked inside
else
join_alg = JoinAlgType::JTYPE_HASH;
} else {
if (cond[0].IsType_JoinSimple() &&
(cond[0].op == common::Operator::O_MORE_EQ || cond[0].op == common::Operator::O_MORE ||
cond[0].op == common::Operator::O_LESS_EQ || cond[0].op == common::Operator::O_LESS))
join_alg = JoinAlgType::JTYPE_SORT;
}
return join_alg;
}
选择join优化器问题分析:
- 仅判定join simple场景,未判断exists子句
- cond[0].IsType_JoinSimple() 如果走入了else分支,相当于被执行了两次
ChooseJoinAlgorithm函数优化:
- 加入exists的判定, 以 IsType_JoinSimple 和 == common::Operator::O_EQ条件对待
- 优化代码结构, 清理冗余的cond[0].IsType_JoinSimple()执行
- 其他逻辑不做任何修改
JoinAlgType TwoDimensionalJoiner::ChooseJoinAlgorithm([[maybe_unused]] MultiIndex &mind, Condition &cond) {
do {
if (cond[0].IsExists()) {
break;
}
if (!cond[0].IsType_JoinSimple()) {
return JoinAlgType::JTYPE_GENERAL;
}
if (cond[0].op == common::Operator::O_EQ) {
break;
}
if (cond[0].op == common::Operator::O_MORE_EQ || cond[0].op == common::Operator::O_MORE ||
cond[0].op == common::Operator::O_LESS_EQ || cond[0].op == common::Operator::O_LESS) {
return JoinAlgType::JTYPE_SORT;
}
} while (0);
JoinAlgType join_alg = JoinAlgType::JTYPE_HASH;
if ((!stonedb_sysvar_force_hashjoin) && (cond.Size() == 1))
join_alg = JoinAlgType::JTYPE_MAP; // available types checked inside
return join_alg;
}
代码优化后exists场景分析:
- 如果未开启强制hash join查询, 且cond.Size() == 1, 则进行JTYPE_MAP查询
- 需要强制开启hash join才可进入hash join查询, 当前测试不开启强制的hash join. 以JTYPE_MAP进行测试
优化走JTYPE_MAP查询测试:
MAP子查询耗时:
mysql> select
-> o_orderpriority,
-> count(*) as order_count
-> from
-> orders
-> where
-> o_orderdate >= date '1993-07-01'
-> and o_orderdate < date '1993-07-01' + interval '3' month
-> and exists (
-> select
-> *
-> from
-> lineitem
-> where
-> l_orderkey = o_orderkey
-> and l_commitdate < l_receiptdate
-> )
-> group by
-> o_orderpriority
-> order by
-> o_orderpriority ;
+-----------------+-------------+
| o_orderpriority | order_count |
+-----------------+-------------+
| 1-URGENT | 1147477 |
| 2-HIGH | 1146447 |
| 3-MEDIUM | 1146770 |
| 4-NOT SPECIFIED | 1146281 |
| 5-LOW | 1146801 |
+-----------------+-------------+
5 rows in set (27.36 sec)
MAP子查询对比之前的子查询耗时:
JTYPE_MAP逻辑的火焰图
强制走JTYPE_HASH查询测试:
强制开启hash join优化, 对比同样场景下与map查询的区别
HASH子查询耗时:
mysql> select
-> o_orderpriority,
-> count(*) as order_count
-> from
-> orders
-> where
-> o_orderdate >= date '1993-07-01'
-> and o_orderdate < date '1993-07-01' + interval '3' month
-> and exists (
-> select
-> *
-> from
-> lineitem
-> where
-> l_orderkey = o_orderkey
-> and l_commitdate < l_receiptdate
-> )
-> group by
-> o_orderpriority
-> order by
-> o_orderpriority ;
+-----------------+-------------+
| o_orderpriority | order_count |
+-----------------+-------------+
| 1-URGENT | 1147477 |
| 2-HIGH | 1146447 |
| 3-MEDIUM | 1146770 |
| 4-NOT SPECIFIED | 1146281 |
| 5-LOW | 1146801 |
+-----------------+-------------+
5 rows in set (27.60 sec)
HASH子查询的火焰图:
转载自:https://segmentfault.com/a/1190000042276870