我正在寻找一些“推理规则”(类似于设置操作规则或逻辑规则),可以用来减少SQL查询的复杂性或大小。是否存在类似的东西?有论文,有工具吗?您自己发现的任何对等物吗?它在某种程度上类似于查询优化,但在性能方面却不一样。
换种说法:使用JOIN,SUBSELECT和UNION进行(复杂)查询是否有可能(或不能)通过使用一些转换规则将其简化为更简单的等效SQL语句,从而产生相同的结果?
因此,我正在寻找SQL语句的等效转换,例如大多数SUBSELECT都可以重写为JOIN的事实。
这正是优化程序谋生的手段(不是说他们总是做得很好)。
由于SQL是一种基于集合的语言,因此通常存在不止一种将一个查询转换为另一个查询的方法。
SQL
像这个查询:
SELECT * FROM mytable WHERE col1 > @value1 OR col2 < @value2
可以转换成这样:
SELECT * FROM mytable WHERE col1 > @value1 UNION SELECT * FROM mytable WHERE col2 < @value2
或这个:
SELECT mo.* FROM ( SELECT id FROM mytable WHERE col1 > @value1 UNION SELECT id FROM mytable WHERE col2 < @value2 ) mi JOIN mytable mo ON mo.id = mi.id
,看起来比较丑陋,但可以产生更好的执行计划。
最常见的事情之一是替换此查询:
SELECT * FROM mytable WHERE col IN ( SELECT othercol FROM othertable )
与此:
SELECT * FROM mytable mo WHERE EXISTS ( SELECT NULL FROM othertable o WHERE o.othercol = mo.col )
在某些RDBMS(如PostgreSQL)中,DISTINCT并GROUP BY使用不同的执行计划,因此有时最好将其中一个替换为另一个:
RDBMS
PostgreSQL
DISTINCT
GROUP BY
SELECT mo.grouper, ( SELECT SUM(col) FROM mytable mi WHERE mi.grouper = mo.grouper ) FROM ( SELECT DISTINCT grouper FROM mytable ) mo
与
SELECT mo.grouper, SUM(col) FROM mytable GROUP BY mo.grouper
在中PostgreSQL,DISTINCT排序和GROUP BY散列。
MySQL缺少FULL OUTER JOIN,因此可以将其改写为以下形式:
MySQL
FULL OUTER JOIN
SELECT t1.col1, t2.col2 FROM table1 t1 LEFT OUTER JOIN table2 t2 ON t1.id = t2.id
SELECT t1.col1, t2.col2 FROM table1 t1 LEFT JOIN table2 t2 ON t1.id = t2.id UNION ALL SELECT NULL, t2.col2 FROM table1 t1 RIGHT JOIN table2 t2 ON t1.id = t2.id WHERE t1.id IS NULL
,但请参阅我的博客中有关如何更有效地执行此操作的文章MySQL:
此分层查询在Oracle:
Oracle
SELECT DISTINCT(animal_id) AS animal_id FROM animal START WITH animal_id = :id CONNECT BY PRIOR animal_id IN (father, mother) ORDER BY animal_id
可以转换为:
SELECT DISTINCT(animal_id) AS animal_id FROM ( SELECT 0 AS gender, animal_id, father AS parent FROM animal UNION ALL SELECT 1, animal_id, mother FROM animal ) START WITH animal_id = :id CONNECT BY parent = PRIOR animal_id ORDER BY animal_id
,后者表现更好。
有关执行计划的详细信息,请参阅我的博客中的这篇文章:
要查找与给定范围重叠的所有范围,可以使用以下查询:
SELECT * FROM ranges WHERE end_date >= @start AND start_date <= @end
,但是在SQL Server这种更复杂的查询中,更快地产生相同的结果:
SQL Server
SELECT * FROM ranges WHERE (start_date > @start AND start_date <= @end) OR (@start BETWEEN start_date AND end_date)
,不管您信不信,我的博客上也有一篇文章:
SQL Server 还缺少执行累积聚合的有效方法,因此此查询:
SELECT mi.id, SUM(mo.value) AS running_sum FROM mytable mi JOIN mytable mo ON mo.id <= mi.id GROUP BY mi.id
可以使用,主帮助我更有效地重写,游标(你听我的权利:cursors,more efficiently并SQL Server在一个句子)。
cursors
more efficiently
请参阅我的博客中有关如何执行此操作的文章:
在金融应用程序中通常会遇到一种查询,查询某种货币的有效汇率,例如Oracle:
SELECT TO_CHAR(SUM(xac_amount * rte_rate), 'FM999G999G999G999G999G999D999999') FROM t_transaction x JOIN t_rate r ON (rte_currency, rte_date) IN ( SELECT xac_currency, MAX(rte_date) FROM t_rate WHERE rte_currency = xac_currency AND rte_date <= xac_date )
可以大量重写此查询以使用等于条件,HASH JOIN而该条件允许a而不是NESTED LOOPS:
HASH JOIN
NESTED LOOPS
WITH v_rate AS ( SELECT cur_id AS eff_currency, dte_date AS eff_date, rte_rate AS eff_rate FROM ( SELECT cur_id, dte_date, ( SELECT MAX(rte_date) FROM t_rate ri WHERE rte_currency = cur_id AND rte_date <= dte_date ) AS rte_effdate FROM ( SELECT ( SELECT MAX(rte_date) FROM t_rate ) - level + 1 AS dte_date FROM dual CONNECT BY level <= ( SELECT MAX(rte_date) - MIN(rte_date) FROM t_rate ) ) v_date, ( SELECT 1 AS cur_id FROM dual UNION ALL SELECT 2 AS cur_id FROM dual ) v_currency ) v_eff LEFT JOIN t_rate ON rte_currency = cur_id AND rte_date = rte_effdate ) SELECT TO_CHAR(SUM(xac_amount * eff_rate), 'FM999G999G999G999G999G999D999999') FROM ( SELECT xac_currency, TRUNC(xac_date) AS xac_date, SUM(xac_amount) AS xac_amount, COUNT(*) AS cnt FROM t_transaction x GROUP BY xac_currency, TRUNC(xac_date) ) JOIN v_rate ON eff_currency = xac_currency AND eff_date = xac_date
尽管笨拙,但后者查询的6速度要快几倍。
6
这里的主要思想是将替换<=为=,这需要构建一个内存日历表。要JOIN与。
<=
=
JOIN