我有以下查询,它为每个查询获取id最新的N :observations``station
id
observations``station
SELECT id FROM ( SELECT station_id, id, created_at, row_number() OVER(PARTITION BY station_id ORDER BY created_at DESC) AS rn FROM ( SELECT station_id, id, created_at FROM observations ) s ) s WHERE rn <= #{n} ORDER BY station_id, created_at DESC;
我有指标的id,station_id,created_at。
station_id
created_at
这是我想出的唯一解决方案,每个站最多可以获取单个记录。但是,它非常慢(81000条记录的表为154.0毫秒)。
如何加快查询速度?
假设至少Postgres 9.3。
首先,多列索引将有助于:
CREATE INDEX observations_special_idx ON observations(station_id, created_at DESC, id)
created_at DESC的拟合度稍好一些,但如果不使用,索引仍将以几乎相同的速度向后扫描DESC。
created_at DESC
DESC
假设created_at已定义NOT NULL,否则DESC NULLS LAST在索引 和 查询中考虑:
NOT NULL
DESC NULLS LAST
最后一列id仅在从中获得仅索引的扫描时才有用,如果不断添加许多新行,则可能无法使用。在这种情况下,请id从索引中删除。
简化查询,内部子选择无济于事:
SELECT id FROM ( SELECT station_id, id, created_at , row_number() OVER (PARTITION BY station_id ORDER BY created_at DESC) AS rn FROM observations ) s WHERE rn <= #{n} -- your limit here ORDER BY station_id, created_at DESC;
应该会快一点,但仍然很慢。
为了 真正 快,您需要等效的 宽松索引扫描 (尚未在Postgres中实现)。相关答案:
如果您有一个单独的表stations(似乎是这样),则可以使用 JOIN LATERAL (Postgres 9.3+)进行模拟:
stations
JOIN LATERAL
SELECT o.id FROM stations s CROSS JOIN LATERAL ( SELECT o.id FROM observations o WHERE o.station_id = s.station_id -- lateral reference ORDER BY o.created_at DESC LIMIT #{n} -- your limit here ) o ORDER BY s.station_id, o.created_at DESC;
如果您没有的表格stations,那么下一个最好的办法就是创建并维护一个表格。可能添加外键引用以增强关系完整性。
如果那不是一个选择,则可以动态地提取这样的表。简单的选择是:
~~SELECT DISTINCT station_id FROM observations; SELECT station_id FROM observations GROUP BY 1;~~
但是任何一个都需要顺序扫描并且很慢。使Postgres使用station_id具有 递归CTE的 上述索引(或任何具有前导列的btree索引):
WITH RECURSIVE stations AS ( ( -- extra pair of parentheses ... SELECT station_id FROM observations ORDER BY station_id LIMIT 1 ) -- ... is required! UNION ALL SELECT (SELECT o.station_id FROM observations o WHERE o.station_id > s.station_id ORDER BY o.station_id LIMIT 1) FROM stations s WHERE s.station_id IS NOT NULL -- serves as break condition ) SELECT station_id FROM stations WHERE station_id IS NOT NULL; -- remove dangling row with NULL
将其用作上述简单查询中的表的 替代品stations:
WITH RECURSIVE stations AS ( ( SELECT station_id FROM observations ORDER BY station_id LIMIT 1 ) UNION ALL SELECT (SELECT o.station_id FROM observations o WHERE o.station_id > s.station_id ORDER BY o.station_id LIMIT 1) FROM stations s WHERE s.station_id IS NOT NULL ) SELECT o.id FROM stations s CROSS JOIN LATERAL ( SELECT o.id, o.created_at FROM observations o WHERE o.station_id = s.station_id ORDER BY o.created_at DESC LIMIT #{n} -- your limit here ) o WHERE s.station_id IS NOT NULL ORDER BY s.station_id, o.created_at DESC;
这应该仍然比您 的数量级 要快。