在此答案之后,我想知道使用PostgreSQL内置全文搜索的最佳方法是,如果我想按等级排序, 并且 只限于匹配的查询。
让我们假设一个非常简单的表。
CREATE TABLE pictures ( id SERIAL PRIMARY KEY, title varchar(300), ... )
管他呢。现在,我想搜索该title字段。首先,我创建一个索引:
title
CREATE INDEX pictures_title ON pictures USING gin(to_tsvector('english', title));
现在我要搜索'small dog'。这有效:
'small dog'
SELECT pictures.id, ts_rank_cd( to_tsvector('english', pictures.title), 'small dog' ) AS score FROM pictures ORDER BY score DESC
但是我真正想要的是:
SELECT pictures.id, ts_rank_cd( to_tsvector('english', pictures.title), to_tsquery('small dog') ) AS score FROM pictures WHERE to_tsvector('english', pictures.title) @@ to_tsquery('small dog') ORDER BY score DESC
或者,这(不起作用-不能score在该WHERE子句中使用):
score
WHERE
SELECT pictures.id, ts_rank_cd( to_tsvector('english', pictures.title), to_tsquery('small dog') ) AS score FROM pictures WHERE score > 0 ORDER BY score DESC
最好的方法是什么?我的问题有很多:
to_tsvector(...)
to_ts...
score > 0
@@
使用@@运算符将利用全文本GIN索引,而对于的测试则score > 0不会。
我在问题中创建了一个表,但是添加了一个名为的列title_tsv:
title_tsv
CREATE TABLE test_pictures ( id BIGSERIAL, title text, title_tsv tsvector ); CREATE INDEX ix_pictures_title_tsv ON test_pictures USING gin(title_tsv);
我在表中填充了一些测试数据:
INSERT INTO test_pictures(title, title_tsv) SELECT T.data, to_tsvector(T.data) FROM some_table T;
然后,我使用以下命令运行先前接受的答案explain analyze:
explain analyze
EXPLAIN ANALYZE SELECT score, id, title FROM ( SELECT ts_rank_cd(P.title_tsv, to_tsquery('address & shipping')) AS score ,P.id ,P.title FROM test_pictures as P ) S WHERE score > 0 ORDER BY score DESC;
并得到以下内容。注意 执行时间为5,015毫秒
QUERY PLAN | ----------------------------------------------------------------------------------------------------------------------------------------------| Gather Merge (cost=274895.48..323298.03 rows=414850 width=60) (actual time=5010.844..5011.330 rows=1477 loops=1) | Workers Planned: 2 | Workers Launched: 2 | -> Sort (cost=273895.46..274414.02 rows=207425 width=60) (actual time=4994.539..4994.555 rows=492 loops=3) | Sort Key: (ts_rank_cd(p.title_tsv, to_tsquery('address & shipping'::text))) DESC | Sort Method: quicksort Memory: 131kB | -> Parallel Seq Scan on test_pictures p (cost=0.00..247776.02 rows=207425 width=60) (actual time=17.672..4993.997 rows=492 loops=3) | Filter: (ts_rank_cd(title_tsv, to_tsquery('address & shipping'::text)) > '0'::double precision) | Rows Removed by Filter: 497296 | Planning time: 0.159 ms | Execution time: 5015.664 ms |
现在将其与@@运算符进行比较:
EXPLAIN ANALYZE SELECT ts_rank_cd(to_tsvector(P.title), to_tsquery('address & shipping')) AS score ,P.id ,P.title FROM test_pictures as P WHERE P.title_tsv @@ to_tsquery('address & shipping') ORDER BY score DESC;
结果 以大约29 ms 的 执行时间进入 :
QUERY PLAN | -------------------------------------------------------------------------------------------------------------------------------------------------| Gather Merge (cost=13884.42..14288.35 rows=3462 width=60) (actual time=26.472..26.942 rows=1477 loops=1) | Workers Planned: 2 | Workers Launched: 2 | -> Sort (cost=12884.40..12888.73 rows=1731 width=60) (actual time=17.507..17.524 rows=492 loops=3) | Sort Key: (ts_rank_cd(to_tsvector(title), to_tsquery('address & shipping'::text))) DESC | Sort Method: quicksort Memory: 171kB | -> Parallel Bitmap Heap Scan on test_pictures p (cost=72.45..12791.29 rows=1731 width=60) (actual time=1.781..17.268 rows=492 loops=3) | Recheck Cond: (title_tsv @@ to_tsquery('address & shipping'::text)) | Heap Blocks: exact=625 | -> Bitmap Index Scan on ix_pictures_title_tsv (cost=0.00..71.41 rows=4155 width=0) (actual time=3.765..3.765 rows=1477 loops=1) | Index Cond: (title_tsv @@ to_tsquery('address & shipping'::text)) | Planning time: 0.214 ms | Execution time: 28.995 ms |
正如您在执行计划中看到的那样,索引ix_pictures_title_tsv是在第二个查询中使用的,而没有在第一个查询中使用,这使@@操作员查询的速度提高了172倍!
ix_pictures_title_tsv