小编典典

如果剧集直接连续或重叠,则合并DATE行

sql

我有一个这样的表:

ID    BEGIN    END

如果相同ID的情节重叠(如2000-01-01-2001-12-312000-06-01-
2002-06-31),我希望使用MIN(BEGIN),合并行MAX(END)

如果情节是直接连续的(例如2000-01-01-2000-06-312000-07-01- 2000-12-31),则应执行相同的操作。

如果有“失踪”事件(如之间的天2000-01-01-2000-06-152000-07-01- 2000-12-31),他们应该
不会 合并。

如何做到这一点?

目前,我的代码如下所示:

SELECT "ID", MIN("BEGIN"), MAX("END")
FROM ...
GROUP BY "ID"

但是,当然,这不能满足最后一个条件(如果有“丢失”的日子,则不能合并)。

先感谢您!

[编辑]

我正在研究一种解决方案,我将表格与自身连接在一起。这是一个进步,但还没有完成。我认为其他建议更好(但更复杂)。但是,我想分享我未完成的工作:

SELECT "ID", LEAST(tab1."BEGIN", tab2."BEGIN"), GREATEST(tab1."END", tab2."END")
  FROM <mytable> AS tab1
  JOIN <mytable> AS tab2
    ON tab1."ID" = tab2."ID"
    AND  (tab1."BEGIN", tab1."END" + INTERVAL '2 day') OVERLAPS (tab2."BEGIN", tab2."END")
  ORDER BY "ID"

[编辑2]

谢谢您的帮助!

到目前为止,我试图弄清楚窗口函数和WITH-查询是如何工作的-直到我意识到我的数据库在PostGreSQL
8.3上运行(这两个都不支持)。有没有窗口功能和WITH-QUERY的方法吗?

再次谢谢你!

[编辑3]

样本数据:

ID        BEGIN         END
1;"2000-01-01";"2000-03-31"  
1;"2000-04-01";"2000-05-31"  
1;"2000-04-15";"2000-07-31"  
1;"2000-09-01";"2000-10-31"  
2;"2000-02-01";"2000-03-15"  
2;"2000-01-15";"2000-03-31"  
2;"2000-04-01";"2000-04-15"  
3;"2000-06-01";"2000-06-15"  
3;"2000-07-01";"2000-07-15"

样本输出:

ID        BEGIN         END
1;"2000-01-01";"2000-07-31"
1;"2000-09-01";"2000-10-31"
2;"2000-01-15";"2000-04-15"
3;"2000-06-01";"2000-06-15"
3;"2000-07-01";"2000-07-15"

[编辑4]

一种可能的解决方案:

WITH
  t1 AS (
    SELECT id, begin AS time
      FROM "nace-8510-test".checkfkt
    UNION ALL
    SELECT id, end
      FROM "nace-8510-test".checkfkt
  ),

  t2 AS (
    SELECT Row_Number() OVER(PARTITION BY id ORDER BY time) AS num, id, time
      FROM t1 AS t1_1
  ),

  t3 AS (
    SELECT t2_1.num - Row_Number() OVER(PARTITION BY t2_1.id ORDER BY t2_1.time, t2_2.time) num1,
        t2_1.id, t2_1.time AS begin, t2_2.time AS end
      FROM t2 AS t2_1
        INNER JOIN t2 AS t2_2
          ON t2_1.id = t2_2.id
            AND t2_1.num = t2_2.num - 1
      WHERE
        EXISTS (
          SELECT *
            FROM "nace-8510-test".checkfkt AS s
            WHERE s.id = t2_1.id
              AND (s.begin < t2_2.time AND s.end > t2_1.time)
        )
        OR t2_1.time = t2_2.time
        OR t2_1.time + INTERVAL '1 day' = t2_2.time
  )

SELECT id, MIN(begin) AS von, MAX(end) AS bis
  FROM t3
  GROUP BY id, num1
  ORDER BY id

阅读 190

收藏
2021-04-07

共1个答案

小编典典

编辑:这是一个好消息,您的DBA同意升级到较新版本的PostgreSQL。单独的窗口功能使升级成为值得的投资。

我的原始答案是一个主要缺陷:每行限制为一行id
下面是没有这种限制的更好的解决方案。
我已经使用系统上的测试表(8.4)对其进行了测试。

如果您有时间,我想知道它如何对您的数据执行。
我还在这里写了一个解释:https :
//www.mechanical-meat.com/1/detail

WITH RECURSIVE t1_rec ( id, "begin", "end", n ) AS (
    SELECT id, "begin", "end", n
      FROM (
        SELECT
            id, "begin", "end",
            CASE 
                WHEN LEAD("begin") OVER (
                PARTITION BY    id
                ORDER BY        "begin") <= ("end" + interval '2' day) 
                THEN 1 ELSE 0 END AS cl,
            ROW_NUMBER() OVER (
                PARTITION BY    id
                ORDER BY        "begin") AS n
        FROM mytable 
    ) s
    WHERE s.cl = 1
  UNION ALL
    SELECT p1.id, p1."begin", p1."end", a.n
      FROM t1_rec a 
           JOIN mytable p1 ON p1.id = a.id
       AND p1."begin" > a."begin"
       AND (a."begin",  a."end" + interval '2' day) OVERLAPS 
           (p1."begin", p1."end")
)
SELECT t1.id, min(t1."begin"), max(t1."end")
  FROM t1_rec t1
       LEFT JOIN t1_rec t2 ON t1.id = t2.id 
       AND t2."end" = t1."end"
       AND t2.n < t1.n
 WHERE t2.n IS NULL
 GROUP BY t1.id, t1.n
 ORDER BY t1.id, t1.n;

原始(已弃用)答案如下;
注意:每个限制为一行id


Denis对于使用lead()and可能是正确的lag(),但还有另一种方法!
您还可以使用所谓的递归SQL解决此问题。
重叠功能也派上用场了。

我已经在系统上完全测试了该解决方案(8.4)。
它运作良好。

WITH RECURSIVE rec_stmt ( id, begin, end ) AS (
    /* seed statement: 
           start with only first start and end dates for each id 
    */
      SELECT id, MIN(begin), MIN(end)
        FROM mytable seed_stmt
    GROUP BY id

    UNION ALL

    /* iterative (not really recursive) statement: 
           append qualifying rows to resultset 
    */
      SELECT t1.id, t1.begin, t1.end
        FROM rec_stmt r
             JOIN mytable t1 ON t1.id = r.id
         AND t1.begin > r.end
         AND (r.begin, r.end + INTERVAL '1' DAY) OVERLAPS 
             (t1.begin - INTERVAL '1' DAY, t1.end)
)
  SELECT MIN(begin), MAX(end) 
    FROM rec_stmt
GROUP BY id;
2021-04-07