我想在一个请求中,用最后一个已知值填充所有Null值。当它在表中而不是在请求中时,很容易:
如果我按如下方式定义并填写表格:
CREATE TABLE test_fill_null ( date INTEGER, value INTEGER ); INSERT INTO test_fill_null VALUES (1,2), (2, NULL), (3, 45), (4,NULL), (5, null); SELECT * FROM test_fill_null ; date | value ------+------- 1 | 2 2 | 3 | 45 4 | 5 |
然后,我只需要像这样填充:
UPDATE test_fill_null t1 SET value = ( SELECT t2.value FROM test_fill_null t2 WHERE t2.date <= t1.date AND value IS NOT NULL ORDER BY t2.date DESC LIMIT 1 ); SELECT * FROM test_fill_null; date | value ------+------- 1 | 2 2 | 2 3 | 45 4 | 45 5 | 45
但是现在,我正在发出这样的请求:
WITH pre_table AS( SELECT id1, id2, tms, CASE WHEN tms - lag(tms) over w < interval '5 minutes' THEN NULL ELSE id2 END as group_id FROM table0 window w as (partition by id1 order by tms) )
如果前一点距离超过5分钟,则group_id设置为id2,否则为null。这样,我想得出的点是彼此之间相距不到5分钟的点,而每组之间的间隔相差超过5分钟。
那我不知道该怎么办。我试过了:
SELECT distinct on (id1, id2) t0.id1, t0.id2, t0.tms, t1.group_id FROM pre_table t0 LEFT JOIN ( select id1, tms, group_id from pre_table t2 where t2.group_id is not null order by tms desc ) t1 ON t1.tms <= t0.tms AND t1.id1 = t0.id1 WHERE t0.id1 IS NOT NULL ORDER BY id1, id2, t1.tms DESC
但是在最终结果中,我有一组连续两个点相距超过5分钟的小组。在这种情况下,他们应该是两个不同的组。
一个 “选择中的选择” 更通常被称为“子查询”或“子查询”在您的特定情况下,它是一个 相关子查询 。LATERALjoin(postgres 9.3中的新增功能)可以在很大程度上用更灵活的解决方案替换相关的子查询:
LATERAL
我认为您在这里都不需要。
对于您的 第一种情况, 该查询可能更快,更简单:
SELECT date, max(value) OVER (PARTITION BY grp) AS value FROM ( SELECT *, count(value) OVER (ORDER BY date) AS grp FROM test_fill_null ) sub;
count()仅计算非null值,因此grp每个非null都会递增value,从而根据需要形成组。这是微不足道的挑 一个 非空value每个grp在外SELECT。
count()
grp
value
SELECT
对于 第二种情况 ,我假设行的初始顺序(id1, id2, tms)由您的查询之一确定。
(id1, id2, tms)
SELECT id1, id2, tms , count(step) OVER (ORDER BY id1, id2, tms) AS group_id FROM ( SELECT *, CASE WHEN lag(tms, 1, '-infinity') OVER (PARTITION BY id1 ORDER BY id2, tms) < tms - interval '5 min' THEN true END AS step FROM table0 ) sub ORDER BY id1, id2, tms;
适应您的实际订单。其中之一可能涵盖以下内容:
PARTITION BY id1 ORDER BY id2 -- ignore tms PARTITION BY id1 ORDER BY tms -- ignore id2
SQL Fiddle 带有扩展示例。