我有一个看起来像这样的表:
--------------------------- |housing_id | facility_id | --------------------------- | 1 | 7 | | 1 | 4 | | 2 | 7 | ---------------------------
现在,我要执行的操作是获取具有两位7和4的facility_id的所有housing_id。因此,在这种情况下,查询应仅返回housing_id 1。数据库是mysql。
另一种方法是-
SELECT housing_id FROM mytable WHERE facility_id IN (4,7) GROUP BY housing_id HAVING COUNT(DISTINCT facility_id) = 2
更新 -受约瑟夫(Josvic)评论的启发,我决定进行更多测试,并认为我将包括我的发现。
使用此查询的好处之一是很容易进行修改以包含更多的facility_id。如果您要查找所有具有facility_ids 1、3、4和7的housing_id,则只需-
SELECT housing_id FROM mytable WHERE facility_id IN (1,3,4,7) GROUP BY housing_id HAVING COUNT(DISTINCT facility_id) = 4
根据所采用的索引策略,所有这三个查询的性能差异很大。无论使用什么索引,我都无法从依赖子查询版本中获得测试数据集上的合理性能。
如果在两列上使用单独的单列索引,Tim提供的自连接解决方案可以很好地执行,但随着标准数量的增加,它的执行效果将不尽人意。
这是我的测试表上的一些基本统计信息-50万行-147963房屋ID,idacity_id的潜在值在1到9之间。
这是用于运行所有这些测试的索引-
SHOW INDEXES FROM mytable; +---------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | +---------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+ | mytable | 0 | UQ_housing_facility | 1 | housing_id | A | 500537 | NULL | NULL | | BTREE | | mytable | 0 | UQ_housing_facility | 2 | facility_id | A | 500537 | NULL | NULL | | BTREE | | mytable | 0 | UQ_facility_housing | 1 | facility_id | A | 12 | NULL | NULL | | BTREE | | mytable | 0 | UQ_facility_housing | 2 | housing_id | A | 500537 | NULL | NULL | | BTREE | | mytable | 1 | IX_housing | 1 | housing_id | A | 500537 | NULL | NULL | | BTREE | | mytable | 1 | IX_facility | 1 | facility_id | A | 12 | NULL | NULL | | BTREE | +---------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
测试的第一个查询是从属子查询-
SELECT SQL_NO_CACHE DISTINCT housing_id FROM mytable WHERE housing_id IN (SELECT housing_id FROM mytable WHERE facility_id=4) AND housing_id IN (SELECT housing_id FROM mytable WHERE facility_id=7); 17321 rows in set (9.15 sec) +----+--------------------+---------+-----------------+----------------------------------------------------------------+---------------------+---------+------------+--------+---------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+---------+-----------------+----------------------------------------------------------------+---------------------+---------+------------+--------+---------------------------------------+ | 1 | PRIMARY | mytable | range | NULL | IX_housing | 4 | NULL | 500538 | Using where; Using index for group-by | | 3 | DEPENDENT SUBQUERY | mytable | unique_subquery | UQ_housing_facility,UQ_facility_housing,IX_housing,IX_facility | UQ_housing_facility | 8 | func,const | 1 | Using index; Using where | | 2 | DEPENDENT SUBQUERY | mytable | unique_subquery | UQ_housing_facility,UQ_facility_housing,IX_housing,IX_facility | UQ_housing_facility | 8 | func,const | 1 | Using index; Using where | +----+--------------------+---------+-----------------+----------------------------------------------------------------+---------------------+---------+------------+--------+---------------------------------------+ SELECT SQL_NO_CACHE DISTINCT housing_id FROM mytable WHERE housing_id IN (SELECT housing_id FROM mytable WHERE facility_id=1) AND housing_id IN (SELECT housing_id FROM mytable WHERE facility_id=3) AND housing_id IN (SELECT housing_id FROM mytable WHERE facility_id=4) AND housing_id IN (SELECT housing_id FROM mytable WHERE facility_id=7); 567 rows in set (9.30 sec) +----+--------------------+---------+-----------------+----------------------------------------------------------------+---------------------+---------+------------+--------+---------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+---------+-----------------+----------------------------------------------------------------+---------------------+---------+------------+--------+---------------------------------------+ | 1 | PRIMARY | mytable | range | NULL | IX_housing | 4 | NULL | 500538 | Using where; Using index for group-by | | 5 | DEPENDENT SUBQUERY | mytable | unique_subquery | UQ_housing_facility,UQ_facility_housing,IX_housing,IX_facility | UQ_housing_facility | 8 | func,const | 1 | Using index; Using where | | 4 | DEPENDENT SUBQUERY | mytable | unique_subquery | UQ_housing_facility,UQ_facility_housing,IX_housing,IX_facility | UQ_housing_facility | 8 | func,const | 1 | Using index; Using where | | 3 | DEPENDENT SUBQUERY | mytable | unique_subquery | UQ_housing_facility,UQ_facility_housing,IX_housing,IX_facility | UQ_housing_facility | 8 | func,const | 1 | Using index; Using where | | 2 | DEPENDENT SUBQUERY | mytable | unique_subquery | UQ_housing_facility,UQ_facility_housing,IX_housing,IX_facility | UQ_housing_facility | 8 | func,const | 1 | Using index; Using where | +----+--------------------+---------+-----------------+----------------------------------------------------------------+---------------------+---------+------------+--------+---------------------------------------+
接下来是我使用GROUP BY … HAVING COUNT …的版本
SELECT SQL_NO_CACHE housing_id FROM mytable WHERE facility_id IN (4,7) GROUP BY housing_id HAVING COUNT(DISTINCT facility_id) = 2; 17321 rows in set (0.79 sec) +----+-------------+---------+-------+---------------------------------+-------------+---------+------+--------+------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+-------+---------------------------------+-------------+---------+------+--------+------------------------------------------+ | 1 | SIMPLE | mytable | range | UQ_facility_housing,IX_facility | IX_facility | 4 | NULL | 198646 | Using where; Using index; Using filesort | +----+-------------+---------+-------+---------------------------------+-------------+---------+------+--------+------------------------------------------+ SELECT SQL_NO_CACHE housing_id FROM mytable WHERE facility_id IN (1,3,4,7) GROUP BY housing_id HAVING COUNT(DISTINCT facility_id) = 4; 567 rows in set (1.25 sec) +----+-------------+---------+-------+---------------------------------+-------------+---------+------+--------+------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+-------+---------------------------------+-------------+---------+------+--------+------------------------------------------+ | 1 | SIMPLE | mytable | range | UQ_facility_housing,IX_facility | IX_facility | 4 | NULL | 407160 | Using where; Using index; Using filesort | +----+-------------+---------+-------+---------------------------------+-------------+---------+------+--------+------------------------------------------+
最后但并非最不重要的一点是自我加入-
SELECT SQL_NO_CACHE a.housing_id FROM mytable a INNER JOIN mytable b ON a.housing_id = b.housing_id WHERE a.facility_id = 4 AND b.facility_id = 7; 17321 rows in set (1.37 sec) +----+-------------+-------+--------+----------------------------------------------------------------+---------------------+---------+-------------------------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+----------------------------------------------------------------+---------------------+---------+-------------------------+-------+-------------+ | 1 | SIMPLE | b | ref | UQ_housing_facility,UQ_facility_housing,IX_housing,IX_facility | IX_facility | 4 | const | 94598 | Using index | | 1 | SIMPLE | a | eq_ref | UQ_housing_facility,UQ_facility_housing,IX_housing,IX_facility | UQ_housing_facility | 8 | test.b.housing_id,const | 1 | Using index | +----+-------------+-------+--------+----------------------------------------------------------------+---------------------+---------+-------------------------+-------+-------------+ SELECT SQL_NO_CACHE a.housing_id FROM mytable a INNER JOIN mytable b ON a.housing_id = b.housing_id INNER JOIN mytable c ON a.housing_id = c.housing_id INNER JOIN mytable d ON a.housing_id = d.housing_id WHERE a.facility_id = 1 AND b.facility_id = 3 AND c.facility_id = 4 AND d.facility_id = 7; 567 rows in set (1.64 sec) +----+-------------+-------+--------+----------------------------------------------------------------+---------------------+---------+-------------------------+-------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+----------------------------------------------------------------+---------------------+---------+-------------------------+-------+--------------------------+ | 1 | SIMPLE | b | ref | UQ_housing_facility,UQ_facility_housing,IX_housing,IX_facility | IX_facility | 4 | const | 93782 | Using index | | 1 | SIMPLE | d | eq_ref | UQ_housing_facility,UQ_facility_housing,IX_housing,IX_facility | UQ_housing_facility | 8 | test.b.housing_id,const | 1 | Using index | | 1 | SIMPLE | c | eq_ref | UQ_housing_facility,UQ_facility_housing,IX_housing,IX_facility | UQ_housing_facility | 8 | test.b.housing_id,const | 1 | Using index | | 1 | SIMPLE | a | eq_ref | UQ_housing_facility,UQ_facility_housing,IX_housing,IX_facility | UQ_housing_facility | 8 | test.d.housing_id,const | 1 | Using where; Using index | +----+-------------+-------+--------+----------------------------------------------------------------+---------------------+---------+-------------------------+-------+--------------------------+