我要检查的数据库中有几个重复项,因此,为了查看哪些重复项,我执行了以下操作:
SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1
这样,我将获得与related_field一起出现的所有行不止一次。该查询需要毫秒来执行。
现在,我想检查每个重复项,因此我想可以在上述查询中选择带有some_table的每一行以及一个related_field,因此我做到了:
SELECT * FROM some_table WHERE relevant_field IN ( SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1 )
由于某种原因,这实际上是缓慢的(需要几分钟)。到底是什么使它变慢了?related_field已建立索引。
最终,我尝试从第一个查询创建视图“ temp_view” (SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1),然后像这样进行第二个查询:
(SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1)
SELECT * FROM some_table WHERE relevant_field IN ( SELECT relevant_field FROM temp_view )
而且效果很好。MySQL在几毫秒内完成了此操作。
这里有任何SQL专家可以解释发生了什么吗?
正在为每行运行子查询,因为它是一个相关查询。通过从子查询中选择所有内容,可以将相关查询变成不相关查询,如下所示:
SELECT * FROM ( SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1 ) AS subquery
最终查询如下所示:
SELECT * FROM some_table WHERE relevant_field IN ( SELECT * FROM ( SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1 ) AS subquery )