假设我有表student, club, 和student_club:
student
club
student_club
student { id name } club { id name } student_club { student_id club_id }
我想知道如何找到足球 (30) 和棒球 (50) 俱乐部的所有学生。 虽然这个查询不起作用,但它是我迄今为止最接近的事情:
SELECT student.* FROM student INNER JOIN student_club sc ON student.id = sc.student_id LEFT JOIN club c ON c.id = sc.club_id WHERE c.id = 30 AND c.id = 50
我很好奇。众所周知,好奇心以杀死猫而闻名。
本次测试的cat-skinning环境:
student.id
student.stud_id
club.id
club.club_id
EXPLAIN ANALYZE
ALTER TABLE student ADD CONSTRAINT student_pkey PRIMARY KEY(stud_id ); ALTER TABLE student_club ADD CONSTRAINT sc_pkey PRIMARY KEY(stud_id, club_id); ALTER TABLE club ADD CONSTRAINT club_pkey PRIMARY KEY(club_id ); CREATE INDEX sc_club_id_idx ON student_club (club_id);
club_pkey此处的大多数查询不需要。 主键在 PostgreSQL 中自动实现唯一索引。 最后一个索引是为了弥补PostgreSQL上多列索引的这个众所周知的缺点:
club_pkey
多列 B 树索引可与涉及索引列的任何子集的查询条件一起使用,但当对前导(最左侧)列有约束时,索引最有效。
总运行时间来自EXPLAIN ANALYZE.
SELECT s.stud_id, s.name FROM student s JOIN student_club sc USING (stud_id) WHERE sc.club_id IN (30, 50) GROUP BY 1,2 HAVING COUNT(*) > 1;
SELECT s.stud_id, s.name FROM student s JOIN ( SELECT stud_id FROM student_club WHERE club_id IN (30, 50) GROUP BY 1 HAVING COUNT(*) > 1 ) sc USING (stud_id);
SELECT s.stud_id, s.name FROM student s WHERE student_id IN ( SELECT student_id FROM student_club WHERE club_id = 30 INTERSECT SELECT stud_id FROM student_club WHERE club_id = 50 );
SELECT s.stud_id, s.name FROM student s WHERE s.stud_id IN (SELECT stud_id FROM student_club WHERE club_id = 30) AND s.stud_id IN (SELECT stud_id FROM student_club WHERE club_id = 50);
SELECT s.stud_id, s.name FROM student s WHERE EXISTS (SELECT * FROM student_club WHERE stud_id = s.stud_id AND club_id = 30) AND EXISTS (SELECT * FROM student_club WHERE stud_id = s.stud_id AND club_id = 50);
SELECT s.stud_id, s.name FROM student s JOIN student_club x ON s.stud_id = x.stud_id JOIN student_club y ON s.stud_id = y.stud_id WHERE x.club_id = 30 AND y.club_id = 50;
最后三个表现几乎相同。4) 和 5) 导致相同的查询计划。
花式SQL,但性能跟不上:
SELECT s.stud_id, s.name FROM student AS s WHERE NOT EXISTS ( SELECT * FROM club AS c WHERE c.club_id IN (30, 50) AND NOT EXISTS ( SELECT * FROM student_club AS sc WHERE sc.stud_id = s.stud_id AND sc.club_id = c.club_id ) );
SELECT s.stud_id, s.name FROM student AS s WHERE NOT EXISTS ( SELECT * FROM ( SELECT 30 AS club_id UNION ALL SELECT 50 ) AS c WHERE NOT EXISTS ( SELECT * FROM student_club AS sc WHERE sc.stud_id = s.stud_id AND sc.club_id = c.club_id ) );
正如预期的那样,这两者的表现几乎相同。查询计划导致表扫描,计划器在这里找不到使用索引的方法。
WITH RECURSIVE two AS ( SELECT 1::int AS level , stud_id FROM student_club sc1 WHERE sc1.club_id = 30 UNION SELECT two.level + 1 AS level , sc2.stud_id FROM student_club sc2 JOIN two USING (stud_id) WHERE sc2.club_id = 50 AND two.level = 1 ) SELECT s.stud_id, s.student FROM student s JOIN two USING (studid) WHERE two.level > 1;
花式 SQL,CTE 性能不错。非常奇特的查询计划。
WITH sc AS ( SELECT stud_id FROM student_club WHERE club_id IN (30,50) GROUP BY stud_id HAVING COUNT(*) > 1 ) SELECT s.* FROM student s JOIN sc USING (stud_id);
查询 2) 的 CTE 变体。令人惊讶的是,对于完全相同的数据,它可能会导致略有不同的查询计划。我在 上找到了一个顺序扫描student,其中子查询变体使用了索引。
另一个后期添加 ypercube。真是太神奇了,有多少种方法。
SELECT s.stud_id, s.student FROM student s JOIN student_club sc USING (stud_id) WHERE sc.club_id = 10 -- member in 1st club ... AND NOT EXISTS ( SELECT * FROM (SELECT 14 AS club_id) AS c -- can't be excluded for missing the 2nd WHERE NOT EXISTS ( SELECT * FROM student_club AS d WHERE d.stud_id = sc.stud_id AND d.club_id = c.club_id ) );
ypercube 的 11) 实际上只是这个更简单变体的令人费解的反向方法,它也仍然缺失。执行速度几乎与顶级猫科动物一样快。
SELECT s.* FROM student s JOIN student_club x USING (stud_id) WHERE sc.club_id = 10 -- member in 1st club ... AND EXISTS ( -- ... and membership in 2nd exists SELECT * FROM student_club AS y WHERE y.stud_id = s.stud_id AND y.club_id = 14 );
难以置信,但这是另一个真正的新变体。我看到了两个以上会员资格的潜力,但它也仅凭两个就跻身顶级among 之列。
SELECT s.* FROM student AS s WHERE EXISTS ( SELECT * FROM student_club AS x JOIN student_club AS y USING (stud_id) WHERE x.stud_id = s.stud_id AND x.club_id = 14 AND y.club_id = 10 );
换句话说:不同数量的过滤器。这个问题要求恰好有两个俱乐部会员资格。但是许多用例必须为不同的数量做准备。