我有一个这样的表:
ID BEGIN END
如果相同ID的情节重叠(如2000-01-01-2001-12-31和2000-06-01- 2002-06-31),我希望使用MIN(BEGIN),合并行MAX(END)。
2000-01-01
2001-12-31
2000-06-01
2002-06-31
MIN(BEGIN)
MAX(END)
如果情节是直接连续的(例如2000-01-01-2000-06-31和2000-07-01- 2000-12-31),则应执行相同的操作。
2000-06-31
2000-07-01
2000-12-31
如果有“失踪”事件(如之间的天2000-01-01-2000-06-15和2000-07-01- 2000-12-31),他们应该 不会 合并。
2000-06-15
如何做到这一点?
目前,我的代码如下所示:
SELECT "ID", MIN("BEGIN"), MAX("END") FROM ... GROUP BY "ID"
但是,当然,这不能满足最后一个条件(如果有“丢失”的日子,则不能合并)。
先感谢您!
[编辑]
我正在研究一种解决方案,我将表格与自身连接在一起。这是一个进步,但还没有完成。我认为其他建议更好(但更复杂)。但是,我想分享我未完成的工作:
SELECT "ID", LEAST(tab1."BEGIN", tab2."BEGIN"), GREATEST(tab1."END", tab2."END") FROM <mytable> AS tab1 JOIN <mytable> AS tab2 ON tab1."ID" = tab2."ID" AND (tab1."BEGIN", tab1."END" + INTERVAL '2 day') OVERLAPS (tab2."BEGIN", tab2."END") ORDER BY "ID"
[编辑2]
谢谢您的帮助!
到目前为止,我试图弄清楚窗口函数和WITH-查询是如何工作的-直到我意识到我的数据库在PostGreSQL 8.3上运行(这两个都不支持)。有没有窗口功能和WITH-QUERY的方法吗?
再次谢谢你!
[编辑3]
样本数据:
ID BEGIN END 1;"2000-01-01";"2000-03-31" 1;"2000-04-01";"2000-05-31" 1;"2000-04-15";"2000-07-31" 1;"2000-09-01";"2000-10-31" 2;"2000-02-01";"2000-03-15" 2;"2000-01-15";"2000-03-31" 2;"2000-04-01";"2000-04-15" 3;"2000-06-01";"2000-06-15" 3;"2000-07-01";"2000-07-15"
样本输出:
ID BEGIN END 1;"2000-01-01";"2000-07-31" 1;"2000-09-01";"2000-10-31" 2;"2000-01-15";"2000-04-15" 3;"2000-06-01";"2000-06-15" 3;"2000-07-01";"2000-07-15"
[编辑4]
一种可能的解决方案:
WITH t1 AS ( SELECT id, begin AS time FROM "nace-8510-test".checkfkt UNION ALL SELECT id, end FROM "nace-8510-test".checkfkt ), t2 AS ( SELECT Row_Number() OVER(PARTITION BY id ORDER BY time) AS num, id, time FROM t1 AS t1_1 ), t3 AS ( SELECT t2_1.num - Row_Number() OVER(PARTITION BY t2_1.id ORDER BY t2_1.time, t2_2.time) num1, t2_1.id, t2_1.time AS begin, t2_2.time AS end FROM t2 AS t2_1 INNER JOIN t2 AS t2_2 ON t2_1.id = t2_2.id AND t2_1.num = t2_2.num - 1 WHERE EXISTS ( SELECT * FROM "nace-8510-test".checkfkt AS s WHERE s.id = t2_1.id AND (s.begin < t2_2.time AND s.end > t2_1.time) ) OR t2_1.time = t2_2.time OR t2_1.time + INTERVAL '1 day' = t2_2.time ) SELECT id, MIN(begin) AS von, MAX(end) AS bis FROM t3 GROUP BY id, num1 ORDER BY id
编辑:这是一个好消息,您的DBA同意升级到较新版本的PostgreSQL。单独的窗口功能使升级成为值得的投资。
我的原始答案是一个主要缺陷:每行限制为一行id。 下面是没有这种限制的更好的解决方案。 我已经使用系统上的测试表(8.4)对其进行了测试。
id
如果您有时间,我想知道它如何对您的数据执行。 我还在这里写了一个解释:https : //www.mechanical-meat.com/1/detail
WITH RECURSIVE t1_rec ( id, "begin", "end", n ) AS ( SELECT id, "begin", "end", n FROM ( SELECT id, "begin", "end", CASE WHEN LEAD("begin") OVER ( PARTITION BY id ORDER BY "begin") <= ("end" + interval '2' day) THEN 1 ELSE 0 END AS cl, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY "begin") AS n FROM mytable ) s WHERE s.cl = 1 UNION ALL SELECT p1.id, p1."begin", p1."end", a.n FROM t1_rec a JOIN mytable p1 ON p1.id = a.id AND p1."begin" > a."begin" AND (a."begin", a."end" + interval '2' day) OVERLAPS (p1."begin", p1."end") ) SELECT t1.id, min(t1."begin"), max(t1."end") FROM t1_rec t1 LEFT JOIN t1_rec t2 ON t1.id = t2.id AND t2."end" = t1."end" AND t2.n < t1.n WHERE t2.n IS NULL GROUP BY t1.id, t1.n ORDER BY t1.id, t1.n;
原始(已弃用)答案如下; 注意:每个限制为一行id。
Denis对于使用lead()and可能是正确的lag(),但还有另一种方法! 您还可以使用所谓的递归SQL解决此问题。 该重叠功能也派上用场了。
lead()
lag()
我已经在系统上完全测试了该解决方案(8.4)。 它运作良好。
WITH RECURSIVE rec_stmt ( id, begin, end ) AS ( /* seed statement: start with only first start and end dates for each id */ SELECT id, MIN(begin), MIN(end) FROM mytable seed_stmt GROUP BY id UNION ALL /* iterative (not really recursive) statement: append qualifying rows to resultset */ SELECT t1.id, t1.begin, t1.end FROM rec_stmt r JOIN mytable t1 ON t1.id = r.id AND t1.begin > r.end AND (r.begin, r.end + INTERVAL '1' DAY) OVERLAPS (t1.begin - INTERVAL '1' DAY, t1.end) ) SELECT MIN(begin), MAX(end) FROM rec_stmt GROUP BY id;