我的PostgreSQL数据库存储学校放假,公共假期和周末日期,供父母计划假期。很多时候,学校假期在周末或公共假期休会。我想显示一个学校假期的非上课日总数。这应包括任何休会的周末或公共假期。
地点
SELECT id, name, is_federal_state FROM locations WHERE is_federal_state = true; | id | name | is_federal_state | |----|-------------------|------------------| | 2 | Baden-W眉rttemberg | true | | 3 | Bayern | true |
holiday_or_vacation_types
SELECT id, name FROM holiday_or_vacation_types; | id | name | |----|-----------------------| | 1 | Herbst | | 8 | Wochenende |
“ Herbst”是德语的“秋天”,“ Wochenende”是德语的“周末”。
期间
SELECT id, starts_on, ends_on, holiday_or_vacation_type_id FROM periods WHERE location_id = 2 ORDER BY starts_on; | id | starts_on | ends_on | holiday_or_vacation_type_id | |-----|--------------|--------------|-----------------------------| | 670 | "2019-10-26" | "2019-10-27" | 8 | | 532 | "2019-10-28" | "2019-10-30" | 1 | | 533 | "2019-10-31" | "2019-10-31" | 1 | | 671 | "2019-11-02" | "2019-11-03" | 8 | | 672 | "2019-11-09" | "2019-11-10" | 8 | | 673 | "2019-11-16" | "2019-11-17" | 8 |
我想要select所有等于2的periods地方location_id。我想以天为单位计算每个期间的持续时间。可以使用以下SQL查询完成此操作:
select
periods
location_id
SELECT id, starts_on, ends_on, (ends_on - starts_on + 1) AS duration, holiday_or_vacation_type_id FROM periods | id | starts_on | ends_on | duration | holiday_or_vacation_type_id | |-----|--------------|--------------|----------|-----------------------------| | 670 | "2019-10-26" | "2019-10-27" | 2 | 8 | | 532 | "2019-10-28" | "2019-10-30" | 3 | 1 | | 533 | "2019-10-31" | "2019-10-31" | 1 | 1 | | 671 | "2019-11-02" | "2019-11-03" | 2 | 8 | | 672 | "2019-11-09" | "2019-11-10" | 2 | 8 | | 673 | "2019-11-16" | "2019-11-17" | 2 | 8 |
_任何看日历的人都会看到ID 670(周末),532(秋季假期)和533(秋季假期)休会了。 因此,他们总共需要6天的假期。_到目前为止,我是使用一个计算该程序的程序来完成的。但这会占用大量资源(实际表中包含约500,000个项目)。
哪个SQL查询将导致以下输出(添加一real_duration列)?使用SQL甚至可能吗?
real_duration
| id | starts_on | ends_on | duration | real_duration | holiday_or_vacation_type_id | |-----|--------------|--------------|----------|---------------|-----------------------------| | 670 | "2019-10-26" | "2019-10-27" | 2 | 6 | 8 | | 532 | "2019-10-28" | "2019-10-30" | 3 | 6 | 1 | | 533 | "2019-10-31" | "2019-10-31" | 1 | 6 | 1 | | 671 | "2019-11-02" | "2019-11-03" | 2 | 2 | 8 | | 672 | "2019-11-09" | "2019-11-10" | 2 | 2 | 8 | | 673 | "2019-11-16" | "2019-11-17" | 2 | 2 | 8 |
可以在一个part_of_range字段中列出休会期吗?这将是结果。可以用SQL完成吗?
part_of_range
| id | starts_on | ends_on | duration | part_of_range | holiday_or_vacation_type_id | |-----|--------------|--------------|----------|---------------|-----------------------------| | 670 | "2019-10-26" | "2019-10-27" | 2 | 670,532,533 | 8 | | 532 | "2019-10-28" | "2019-10-30" | 3 | 670,532,533 | 1 | | 533 | "2019-10-31" | "2019-10-31" | 1 | 670,532,533 | 1 | | 671 | "2019-11-02" | "2019-11-03" | 2 | | 8 | | 672 | "2019-11-09" | "2019-11-10" | 2 | | 8 | | 673 | "2019-11-16" | "2019-11-17" | 2 | | 8 |
这是一个鸿沟和孤岛的问题。在这种情况下,您可以lag()用来查看孤岛的起始位置,然后查看累积总和。
lag()
最后的操作是一些聚合(使用窗口函数):
SELECT p.*, (Max(ends_on) OVER (PARTITION BY location_id, grp) - Min(starts_on) OVER (PARTITION BY location_id, grp) ) + 1 AS duration, Array_agg(p.id) OVER (PARTITION BY location_id) FROM (SELECT p.*, Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER (PARTITION BY location_id ORDER BY starts_on) AS grp FROM (SELECT id, starts_on, ends_on, location_id, holiday_or_vacation_type_id, lag(ends_on) OVER (PARTITION BY location_id ORDER BY (starts_on)) AS prev_eo FROM periods ) p ) p;