我目前有一个由于OR语句而导致的PostgreSQL查询缓慢。因此,显然没有使用索引。到目前为止,此查询的重写失败。
查询:
EXPLAIN ANALYZE SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN advertcategory a1_ ON a0_.advert_category_id = a1_.id WHERE a0_.advert_category_id IN ( 1136 ) OR a1_.parent_id IN ( 1136 ) ORDER BY a0_.created_date DESC LIMIT 15; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..27542.49 rows=15 width=12) (actual time=1.658..50.809 rows=15 loops=1) -> Nested Loop (cost=0.00..1691109.07 rows=921 width=12) (actual time=1.657..50.790 rows=15 loops=1) -> Index Scan Backward using advert_created_date_idx on advert a0_ (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.013..16.449 rows=12405 loops=1) -> Index Scan using advertcategory_pkey on advertcategory a1_ (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405) Index Cond: (id = a0_.advert_category_id) Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136)) Rows Removed by Filter: 1 Total runtime: 50.860 ms
缓慢的原因: Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136))
Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136))
我尝试使用INNER JOIN代替WHERE语句:
EXPLAIN ANALYZE SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN advertcategory a1_ ON a0_.advert_category_id = a1_.id AND ( a0_.advert_category_id IN ( 1136 ) OR a1_.parent_id IN ( 1136 ) ) ORDER BY a0_.created_date DESC LIMIT 15; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..27542.49 rows=15 width=12) (actual time=4.667..139.955 rows=15 loops=1) -> Nested Loop (cost=0.00..1691109.07 rows=921 width=12) (actual time=4.666..139.932 rows=15 loops=1) -> Index Scan Backward using advert_created_date_idx on advert a0_ (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.019..100.765 rows=12405 loops=1) -> Index Scan using advertcategory_pkey on advertcategory a1_ (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405) Index Cond: (id = a0_.advert_category_id) Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136)) Rows Removed by Filter: 1 Total runtime: 140.048 ms
当我删除OR条件之一时,查询速度加快。因此,我创建了一个UNION来查看结果。非常快!但是我不认为这是解决方案:
EXPLAIN ANALYZE (SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN advertcategory a1_ ON a0_.advert_category_id = a1_.id WHERE a0_.advert_category_id IN ( 1136 ) ORDER BY a0_.created_date DESC LIMIT 15) UNION (SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN advertcategory a1_ ON a0_.advert_category_id = a1_.id WHERE a1_.parent_id IN ( 1136 ) ORDER BY a0_.created_date DESC LIMIT 15); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=4125.70..4126.00 rows=30 width=12) (actual time=7.945..7.951 rows=15 loops=1) -> Append (cost=1120.82..4125.63 rows=30 width=12) (actual time=6.811..7.929 rows=15 loops=1) -> Subquery Scan on "*SELECT* 1" (cost=1120.82..1121.01 rows=15 width=12) (actual time=6.810..6.840 rows=15 loops=1) -> Limit (cost=1120.82..1120.86 rows=15 width=12) (actual time=6.809..6.825 rows=15 loops=1) -> Sort (cost=1120.82..1121.56 rows=295 width=12) (actual time=6.807..6.813 rows=15 loops=1) Sort Key: a0_.created_date Sort Method: top-N heapsort Memory: 25kB -> Nested Loop (cost=10.59..1113.59 rows=295 width=12) (actual time=1.151..6.639 rows=220 loops=1) -> Index Only Scan using advertcategory_pkey on advertcategory a1_ (cost=0.00..8.27 rows=1 width=4) (actual time=1.030..1.033 rows=1 loops=1) Index Cond: (id = 1136) Heap Fetches: 1 -> Bitmap Heap Scan on advert a0_ (cost=10.59..1102.37 rows=295 width=16) (actual time=0.099..5.287 rows=220 loops=1) Recheck Cond: (advert_category_id = 1136) -> Bitmap Index Scan on idx_54f1f40bd4436821 (cost=0.00..10.51 rows=295 width=0) (actual time=0.073..0.073 rows=220 loops=1) Index Cond: (advert_category_id = 1136) -> Subquery Scan on "*SELECT* 2" (cost=3004.43..3004.62 rows=15 width=12) (actual time=1.072..1.072 rows=0 loops=1) -> Limit (cost=3004.43..3004.47 rows=15 width=12) (actual time=1.071..1.071 rows=0 loops=1) -> Sort (cost=3004.43..3005.99 rows=626 width=12) (actual time=1.069..1.069 rows=0 loops=1) Sort Key: a0_.created_date Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=22.91..2989.07 rows=626 width=12) (actual time=1.056..1.056 rows=0 loops=1) -> Index Scan using idx_d84ab8ea727aca70 on advertcategory a1_ (cost=0.00..8.27 rows=1 width=4) (actual time=1.054..1.054 rows=0 loops=1) Index Cond: (parent_id = 1136) -> Bitmap Heap Scan on advert a0_ (cost=22.91..2972.27 rows=853 width=16) (never executed) Recheck Cond: (advert_category_id = a1_.id) -> Bitmap Index Scan on idx_54f1f40bd4436821 (cost=0.00..22.70 rows=853 width=0) (never executed) Index Cond: (advert_category_id = a1_.id) Total runtime: 8.940 ms (28 rows)
尝试反转IN语句:
EXPLAIN ANALYZE SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN advertcategory a1_ ON a0_.advert_category_id = a1_.id WHERE 1136 IN ( a0_.advert_category_id, a1_.parent_id ) ORDER BY a0_.created_date DESC LIMIT 15; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..27542.49 rows=15 width=12) (actual time=1.848..62.461 rows=15 loops=1) -> Nested Loop (cost=0.00..1691109.07 rows=921 width=12) (actual time=1.847..62.441 rows=15 loops=1) -> Index Scan Backward using advert_created_date_idx on advert a0_ (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.028..27.316 rows=12405 loops=1) -> Index Scan using advertcategory_pkey on advertcategory a1_ (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405) Index Cond: (id = a0_.advert_category_id) Filter: ((1136 = a0_.advert_category_id) OR (1136 = parent_id)) Rows Removed by Filter: 1 Total runtime: 62.506 ms (8 rows)
使用EXISTS进行了尝试:
EXPLAIN ANALYZE SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN advertcategory a1_ ON a0_.advert_category_id = a1_.id WHERE EXISTS(SELECT test.id FROM advert test INNER JOIN advertcategory test_cat ON test_cat.id = test.advert_category_id WHERE test.id = a0_.id AND ( test.advert_category_id IN ( 1136 ) OR test_cat.parent_id IN ( 1136 ) )) ORDER BY a0_.created_date DESC LIMIT 15; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=45538.18..45538.22 rows=15 width=12) (actual time=524.654..524.673 rows=15 loops=1) -> Sort (cost=45538.18..45540.48 rows=921 width=12) (actual time=524.651..524.658 rows=15 loops=1) Sort Key: a0_.created_date Sort Method: top-N heapsort Memory: 25kB -> Hash Join (cost=39803.59..45515.58 rows=921 width=12) (actual time=497.362..524.436 rows=220 loops=1) Hash Cond: (a0_.advert_category_id = a1_.id) -> Nested Loop (cost=39786.88..45486.21 rows=921 width=16) (actual time=496.748..523.501 rows=220 loops=1) -> HashAggregate (cost=39786.88..39796.09 rows=921 width=4) (actual time=496.705..496.872 rows=220 loops=1) -> Hash Join (cost=16.71..39784.58 rows=921 width=4) (actual time=1.210..496.294 rows=220 loops=1) Hash Cond: (test.advert_category_id = test_cat.id) Join Filter: ((test.advert_category_id = 1136) OR (test_cat.parent_id = 1136)) Rows Removed by Join Filter: 353584 -> Seq Scan on advert test (cost=0.00..33134.04 rows=353804 width=8) (actual time=0.002..177.953 rows=353804 loops=1) -> Hash (cost=9.65..9.65 rows=565 width=8) (actual time=0.622..0.622 rows=565 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 22kB -> Seq Scan on advertcategory test_cat (cost=0.00..9.65 rows=565 width=8) (actual time=0.005..0.327 rows=565 loops=1) -> Index Scan using advert_pkey on advert a0_ (cost=0.00..6.17 rows=1 width=16) (actual time=0.117..0.118 rows=1 loops=220) Index Cond: (id = test.id) -> Hash (cost=9.65..9.65 rows=565 width=4) (actual time=0.604..0.604 rows=565 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 20kB -> Seq Scan on advertcategory a1_ (cost=0.00..9.65 rows=565 width=4) (actual time=0.010..0.285 rows=565 loops=1) Total runtime: 524.797 ms
广告表(向下细分):
353804 rows Table "public.advert" Column | Type | Modifiers | Storage | Stats target | Description -----------------------------+--------------------------------+-----------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('advert_id_seq'::regclass) | plain | | advert_category_id | integer | not null | plain | | Indexes: "idx_54f1f40bd4436821" btree (advert_category_id) "advert_created_date_idx" btree (created_date) Foreign-key constraints: "fk_54f1f40bd4436821" FOREIGN KEY (advert_category_id) REFERENCES advertcategory(id) ON DELETE RESTRICT Has OIDs: no
类别表(向下细分):
565 rows Table "public.advertcategory" Column | Type | Modifiers -----------+---------+------------------------------------------------------------- id | integer | not null default nextval('advertcategory_id_seq'::regclass) parent_id | integer | active | boolean | not null system | boolean | not null Indexes: "advertcategory_pkey" PRIMARY KEY, btree (id) "idx_d84ab8ea727aca70" btree (parent_id) Foreign-key constraints: "fk_d84ab8ea727aca70" FOREIGN KEY (parent_id) REFERENCES advertcategory(id) ON DELETE RESTRICT
简短的服务器配置:
version -------------------------------------------------------------------------------------------------------------- PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit name | current_setting | source ----------------------------+--------------------+---------------------- shared_buffers | 1800MB | configuration file work_mem | 4MB | configuration file
如您所见,没有合适的解决方案可以提高速度。仅UNION解决方案可拆分OR语句,可提高性能。但是我不能使用它,因为此查询是通过我的ORM框架以及许多其他过滤器选项使用的。另外,如果我可以做到这一点,为什么优化器不这样做呢?似乎是一个非常简单的优化。
有什么提示吗?对于这个小问题的解决方案将不胜感激!
“此外,如果我可以做到这一点,那么优化器为什么不这样做呢?” -因为在各种情况下,这样做不一定是有效的(由于子查询中的聚集)或有趣(由于更好的索引)。
在Gordon的答案中给出了您可能获得的最佳查询计划,使用union all而不是union避免排序(我认为一个类别永远不是它自己的父类别,从而消除了重复的可能性)。
union all
union
否则,请注意您的查询可以这样重写:
SELECT a0_.id AS id0 FROM advert a0_ INNER JOIN advertcategory a1_ ON a0_.advert_category_id = a1_.id WHERE a1_.id IN ( 1136 ) OR a1_.parent_id IN ( 1136 ) ORDER BY a0_.created_date DESC LIMIT 15;
换句话说,您正在根据一个表中的条件进行过滤,并基于另一表中的排序/限制。您编写它的方式使您无法使用良好的索引,因为计划者无法识别过滤条件是来自同一张表,因此它将像您当前正在使用的过滤器嵌套在created_date之上。请注意,这不是一个坏计划。如果例如1136不是非常严格的标准,这实际上是正确的选择。
通过摆明第二表是感兴趣的一个,你可能最终得到一个位图堆扫描时类别是选择性不够的,如果你已经索引advertcategory (id)(您已经拥有,如果它的主键)和advertcategory (parent_id)(您目前可能还没有)。不过,不要指望太多- 据我所知,PG不会收集相关的列信息。
advertcategory (id)
advertcategory (parent_id)
另一种可能性可能是直接在广告中维护一个具有汇总类别的数组(使用触发器),并在其上使用GIST索引:
SELECT a0_.id AS id0 FROM advert a0_ WHERE ARRAY[1136, 1137] && a0_.category_ids -- 1136 OR 1137; use <@ for AND ORDER BY a0_.created_date DESC LIMIT 15;
从技术上讲,它是多余的,但可以很好地优化此类查询(例如,在嵌套类别树上的过滤器会产生复杂的联接条件)…当PG决定使用它时,您将最终对适用的广告进行前n位排序。(在较早的PG版本中,&&的选择性由于缺乏统计信息而具有任意性;我隐约记得读过一个变更日志,其中9.1、9.2或9.3改进了事情,大概是通过使用类似于tsvector内容统计信息收集器对通用数组类型使用的代码无论如何,请确保使用最新的PG版本,并确保不要使用无法使用gin/gist索引的运算符重写该查询。)
gin/gist