我必须使用此periods表:
periods
期间
id | starts_on | ends_on ----+------------+------------ 678 | 2019-12-21 | 2019-12-22 534 | 2019-12-23 | 2020-01-04 679 | 2019-12-28 | 2019-12-29 9 | 2020-01-01 | 2020-01-01 776 | 2020-01-04 | 2020-01-05 7 | 2020-01-06 | 2020-01-06 777 | 2020-01-11 | 2020-01-12
它列出了学生不必上学的所有时间段。不幸的是,有些时期是重叠的。当在学校放假期间周末或公共假期发生时(每个人都有自己的时段行),就会发生这种情况。
在查找具有休止日期范围的行的帮助下,并在具有联邦州的国家/地区中,累加了它们的持续时间,空缺和离校假期,我最终得到了以下查询:
SELECT p.id, p.starts_on, p.ends_on, grp, (Max(ends_on) OVER (PARTITION BY grp) - Min(starts_on) OVER (PARTITION BY grp) ) + 1 AS duration, Array_agg(p.id) OVER (PARTITION BY grp) FROM (SELECT p.*, Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER (PARTITION BY 1 ORDER BY starts_on ) AS grp FROM (SELECT p.*, lag(ends_on) OVER (PARTITION BY 1 ORDER BY starts_on) AS prev_eo FROM (SELECT p.id, p.starts_on, p.ends_on FROM periods p WHERE starts_on > '2019-12-15' AND starts_on < '2020-01-15' ) p ) p ) p;
结果是
id | starts_on | ends_on | grp | duration | array_agg ----+------------+------------+-----+----------+--------------- 678 | 2019-12-21 | 2019-12-22 | 0 | 15 | {678,534,679} 534 | 2019-12-23 | 2020-01-04 | 0 | 15 | {678,534,679} 679 | 2019-12-28 | 2019-12-29 | 0 | 15 | {678,534,679} 9 | 2020-01-01 | 2020-01-01 | 1 | 1 | {9} 776 | 2020-01-04 | 2020-01-05 | 2 | 3 | {776,7} 7 | 2020-01-06 | 2020-01-06 | 2 | 3 | {776,7} 777 | 2020-01-11 | 2020-01-12 | 3 | 2 | {777}
前三行为grp0(标识678、534和679)。
grp
但是id 9、776和7也应该属于这个grp。不幸的是,它们重叠了。是否有可能得到这样的结果(我不在乎订单)?
id | starts_on | ends_on | grp | duration | array_agg ----+------------+------------+-----+----------+--------------- 678 | 2019-12-21 | 2019-12-22 | 0 | 17 | {678,534,679,9,776,7} 534 | 2019-12-23 | 2020-01-04 | 0 | 17 | {678,534,679,9,776,7} 679 | 2019-12-28 | 2019-12-29 | 0 | 17 | {678,534,679,9,776,7} 9 | 2020-01-01 | 2020-01-01 | 0 | 17 | {678,534,679,9,776,7} 776 | 2020-01-04 | 2020-01-05 | 0 | 17 | {678,534,679,9,776,7} 7 | 2020-01-06 | 2020-01-06 | 0 | 17 | {678,534,679,9,776,7} 777 | 2020-01-11 | 2020-01-12 | 1 | 2 | {777}
我想知道总岛(grp 0)以天为单位的时间以及它包含的期间ID。
沙箱:https://rextester.com/SHVL41709
这是您其他问题的一个有趣的变体。问题在于,lag()仅查看前一行以检查是否有重叠。相反,您想查看所有前面的行。
lag()
幸运的是,您可以max()为此使用累加器:
max()
SELECT p.id, p.starts_on, p.ends_on, grp, (Max(ends_on) OVER (PARTITION BY grp) - Min(starts_on) OVER (PARTITION BY grp) ) + 1 AS duration, Array_agg(p.id) OVER (PARTITION BY grp) FROM (SELECT p.*, Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER (PARTITION BY 1 ORDER BY starts_on ) AS grp FROM (SELECT p.*, MAX(ends_on) OVER (ORDER BY starts_on ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_eo FROM (SELECT p.id, p.starts_on, p.ends_on FROM periods p WHERE starts_on > '2019-12-15' AND starts_on < '2020-01-15' ) p ) p ) p;
我不确定应该做什么PARTITION BY 1,但是我没有包括在内。
PARTITION BY 1
这是一个学期。
预料到您的下一个问题。这是一个挑战:如果开始时间相等,则累积最大值将不稳定。在这种情况下,您要么要删除重复项,要么要使累积最大值的排序保持稳定。