我在Postgres DB中有一个这样的表:
person | eventdate | type -------------------------------------- <uuid-1> | 2016-05-14 | 300 <uuid-3> | 2016-05-14 | 300 <uuid-1> | 2016-05-15 | 301 <uuid-1> | 2016-05-16 | 301 <uuid-1> | 2016-05-18 | 304 <uuid-1> | 2016-05-22 | 300 <uuid-2> | 2016-05-22 | 304 <uuid-2> | 2016-05-27 | 301 <uuid-1> | 2016-05-30 | 300 <uuid-1> | 2016-06-01 | 300 <uuid-2> | 2016-06-15 | 501 <uuid-2> | 2016-06-16 | 301 <uuid-4> | 2016-06-16 | 300 <uuid-5> | 2016-06-20 | 300 <uuid-1> | 2016-06-21 | 300 <uuid-2> | 2016-06-21 | 300 <uuid-2> | 2016-06-23 | 301 <uuid-2> | 2016-06-30 | 300 <uuid-3> | 2016-06-30 | 300 <uuid-4> | 2016-06-30 | 300
该表包含因缺勤原因(类型)不同而导致雇员缺勤的每一天的非连续日条目。但是,缺勤期可能跨越这几天,并且在以前没有相同类型的5天之内的任何缺勤条目仍被视为相同缺勤“期间”的一部分。
我需要获取每个员工缺勤期间的输出,这些期间的开始和结束日期加上该跨多个日期的期间内的总天数。
由于本报告目的将不同类型的缺勤视为相同,这一事实使情况更加复杂。因此,在上面的示例中,类型300、301、304将被视为相同。
因此,从上面的示例中,以下内容将是我所追求的…
person | startdate | enddate | days | type -------------------------------------------------------------------- <uuid-1> | 2016-05-14 | 2016-05-22 | 5 | 300 <uuid-3> | 2016-05-14 | 2016-04-14 | 1 | 300 <uuid-2> | 2016-05-22 | 2016-04-27 | 2 | 304 <uuid-1> | 2016-05-30 | 2016-06-01 | 2 | 300 <uuid-2> | 2016-06-15 | 2016-06-15 | 1 | 501 <uuid-2> | 2016-06-16 | 2016-06-16 | 1 | 301 <uuid-4> | 2016-06-16 | 2016-06-16 | 1 | 300 <uuid-5> | 2016-06-20 | 2016-06-20 | 1 | 300 <uuid-1> | 2016-06-21 | 2016-06-21 | 1 | 300 <uuid-2> | 2016-06-21 | 2016-06-23 | 2 | 300 <uuid-2> | 2016-06-30 | 2016-06-30 | 1 | 300 <uuid-3> | 2016-06-30 | 2016-06-30 | 1 | 300 <uuid-4> | 2016-06-30 | 2016-06-30 | 1 | 300
如何查询此表到此输出?
目前尚不清楚您如何确定type每个期间的费用。我选择了最小数量。
type
假设此基本表定义为:
CREATE TABLE tbl (person text, eventdate date, type int);
基本上,我建议在两个嵌套的子查询中使用窗口函数来标识同一时期(岛屿)的成员。然后合计:
SELECT person, period , min(eventdate) AS startdate , max(eventdate) AS enddate , count(*) AS days , min(type) AS type FROM ( SELECT person, eventdate, type , count(gap) OVER (PARTITION BY person ORDER BY eventdate) AS period FROM ( SELECT person, eventdate, type , CASE WHEN lag(eventdate) OVER (PARTITION BY person ORDER BY eventdate) > eventdate - 6 -- within 5 days THEN NULL -- same period ELSE TRUE -- next period END AS gap FROM tbl ) sub ) sub GROUP BY person, period ORDER BY person, period;
结果(基于您的示例数据):
person | period | startdate | enddate | days | type ----------+--------+------------+------------+------+------ <uuid-1> | 1 | 2016-05-14 | 2016-05-22 | 5 | 300 <uuid-1> | 2 | 2016-05-30 | 2016-06-01 | 2 | 300 <uuid-1> | 3 | 2016-06-21 | 2016-06-21 | 1 | 300 <uuid-2> | 1 | 2016-05-22 | 2016-05-27 | 2 | 301 <uuid-2> | 2 | 2016-06-15 | 2016-06-23 | 4 | 300 <uuid-2> | 3 | 2016-06-30 | 2016-06-30 | 1 | 300 <uuid-3> | 1 | 2016-05-14 | 2016-05-14 | 1 | 300 <uuid-3> | 2 | 2016-06-30 | 2016-06-30 | 1 | 300 <uuid-4> | 1 | 2016-06-16 | 2016-06-16 | 1 | 300 <uuid-4> | 2 | 2016-06-30 | 2016-06-30 | 1 | 300 <uuid-5> | 1 | 2016-06-20 | 2016-06-20 | 1 | 300
如果可以使用不同的类型多次输入同一个人的同一天,并且您只想计算 不同的 天数,请设置为:count(DISTINCT eventdate) AS days。
count(DISTINCT eventdate) AS days