在 SQL Server 中使用日历表 - 第 2部分


问题 在我之前的文章中,我 重新审视了日历表的概念,并解释了一些使用此数据进行营业日期计算的方法。这一次,我想探索如何使用日历表来简化生成日期范围,以及一些可以帮助您简化的查询挑战。

解决方案 回想一下我们的日历表有一个名为 的键列 TheDate:

CREATE TABLE dbo.Calendar
(
  TheDate date NOT NULL,
  …
  CONSTRAINT PK_Calendar PRIMARY KEY (TheDate),
  …

要从该表生成一系列日期,我们可以使用封闭范围,如下所示:

DECLARE @Start date = '20200101', @End date = '20200105';
SELECT TheDate 
  FROM dbo.Calendar
  WHERE TheDate >= @Start
    AND TheDate <= @End;

结果:

TheDate
----------
2020-01-01
2020-01-02
2020-01-03
2020-01-04
2020-01-05

该查询的计划与您想象的一样简单:

001.png 聚集索引查找针对日历表的日期范围查询 不过,为了方便起见,我们可以将其封装在一个内联表值函数中:

CREATE FUNCTION dbo.GenerateDateSeries  FUNCTION dbo.GenerateDateSeries 
(
  @StartDate        date,,
  @EndDate          date
)
RETURNS TABLE TABLE
WITH SCHEMABINDING SCHEMABINDING
AS
  RETURN RETURN 
  ((
    SELECT TheDateSELECT TheDate
      FROM dbo.CalendarFROM dbo.Calendar
      WHERE TheDate >= @StartDateWHERE TheDate >= @StartDate
        AND TheDate <= @EndDateAND TheDate <= @EndDate
);
GO

现在相同的查询更容易生成(单独生成,或与其他查询结合):

DECLARE @Start date = '20200101', @End date = '20200105'; @Start date = '20200101', @End date = '20200105';
SELECT TheDate SELECT TheDate 
  FROM dbo.GenerateDateSeries (@Start, @End);
FROM dbo.GenerateDateSeries (@Start, @End);

同样的结果,同样的计划,同样的表现。

但是我们还能如何使用这个功能呢?

填补日期空白 生成日期范围的更常见原因之一是填补报告中的空白。让我们在一个名为的数据库中创建一个事务表, SalesStuff并用包含故意间隙的虚构数据填充它:

USE SalesStuff; SalesStuff;
GO
DROP TABLE IF EXISTS dbo.Transactions;DROP TABLE IF EXISTS dbo.Transactions;
GO
CREATE TABLE dbo.TransactionsCREATE TABLE dbo.Transactions
(
  KeyColumn   int NOT NULL, int NOT NULL, 
  EventTime   datetime2(3), datetime2(3), 
  CONSTRAINT  PK_Transactions PRIMARY KEY(KeyColumn),CONSTRAINT  PK_Transactions PRIMARY KEY(KeyColumn),
  INDEX       IX_Transactions_EventTime(EventTime)INDEX       IX_Transactions_EventTime(EventTime)
);
GO
;WITH x(n) AS;WITH x(n) AS
(
  SELECT TOP (40000) ROW_NUMBER() OVER (ORDER BY o.object_id) SELECT TOP (40000) ROW_NUMBER() OVER (ORDER BY o.object_id) 
    FROM master.sys.all_objects AS o FROM master.sys.all_objects AS o 
    CROSS JOIN master.sys.all_objects AS o2CROSS JOIN master.sys.all_objects AS o2
)
INSERT dbo.Transactions(KeyColumn, EventTime) dbo.Transactions(KeyColumn, EventTime)
SELECT TOP (10000) n, TheDate = DATEADD(HOUR, n*2, '20200101') TOP (10000) n, TheDate = DATEADD(HOUR, n*2, '20200101')
  FROM x FROM x 
  WHERE ((n+1)/10) % 3 = 0 WHERE ((n+1)/10) % 3 = 0 
  ORDER BY n;
ORDER BY n;

如果我们只是手动查看前 8 天的事务表,我们会看到有些天缺少行:

SELECT KeyColumn, EventTime KeyColumn, EventTime
  FROM dbo.TransactionsFROM dbo.Transactions
  WHERE EventTime < '20200109';
WHERE EventTime < '20200109';

精简结果:

KeyColumn   EventTime
---------   ----------------
1           2020-01-01 02:00
…
8           2020-01-01 16:00
29          2020-01-03 10:00
…
35          2020-01-03 22:00
36          2020-01-04 00:00
37          2020-01-04 02:00
38          2020-01-04 04:00
59          2020-01-05 22:00
60          2020-01-06 00:00
…
67          2020-01-06 14:00
68          2020-01-06 16:00
89          2020-01-08 10:00
…

你可以看到,是1月2日没有数据第二 或1月7日。但是,如果有人要运行一个简单的报告来获得这 8 天的每日销售额:

DECLARE @Start date = '20200101', @End date = '20200108'; @Start date = '20200101', @End date = '20200108';
SELECT TheDate = CONVERT(date, EventDateTime), TransactionCount = COUNT(*)SELECT TheDate = CONVERT(date, EventDateTime), TransactionCount = COUNT(*)
  FROM SalesStuff.dbo.TransactionsFROM SalesStuff.dbo.Transactions
  WHERE EventDateTime >= @StartWHERE EventDateTime >= @Start
    AND EventDateTime <  DATEADD(DAY, 1, @End)AND EventDateTime <  DATEADD(DAY, 1, @End)
  GROUP BY CONVERT(date, EventDateTime)GROUP BY CONVERT(date, EventDateTime)
  ORDER BY TheDate;
ORDER BY TheDate;

结果将只有 6 行,因为如果不存在,则该查询无法表示天数:

TheDate      TransactionCount
----------   ----------------
2020-01-01   8
2020-01-03   7
2020-01-04   3
2020-01-05   1
2020-01-06   9
2020-01-08   7

这是该查询的计划,仅供参考。注意排序运算符;即使索引查找EventTime按顺序返回列,转换也需要进行后续排序:

002.png 如果我们希望即使没有销售也能显示所有天数,我们需要找到其他方法来填补空白。这是日期序列生成器可以派上用场的地方——我们可以从日期序列函数中提取所有日期,然后对交易数据执行左连接。

DECLARE @Start date = '20200101', @End date = '20200108'; @Start date = '20200101', @End date = '20200108';
SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn) SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn) 
  FROM dbo.GenerateDateSeries(@Start, @End) AS fFROM dbo.GenerateDateSeries(@Start, @End) AS f
  LEFT OUTER JOIN dbo.Transactions AS tLEFT OUTER JOIN dbo.Transactions AS t
    ON t.EventTime >= f.TheDateON t.EventTime >= f.TheDate
   AND t.EventTime < DATEADD(DAY, 1, f.TheDate)AND t.EventTime < DATEADD(DAY, 1, f.TheDate)
  GROUP BY f.TheDateGROUP BY f.TheDate
  ORDER BY f.TheDate;
ORDER BY f.TheDate;

结果显示所有 8 行,其中不存在的日期由 0 表示:

TheDate      TransactionCount
----------   ----------------
2020-01-01   8
2020-01-02  0
2020-01-03   7
2020-01-04   3
2020-01-05   1
2020-01-06   9
2020-01-07  0
2020-01-08   7

可以理解,这里的计划更复杂,并且估计子树成本更高,但这仍然相当有效 - 请注意不再有排序运算符:

003.png

对于窄或宽的日期范围,成本变化很小,但总体成本可能会根据交易表数据的大小、统计信息和分布而有所不同。

您还可以使用一天或几天的范围生成更细粒度的系列,无需任何其他对象,从而轻松生成无间隙的每小时报告。假设,对于 1 月的第一天,我们想要一个全天每半小时有一行的报告。我们可以先从日历表中生成 48 个行号,然后将每个行号乘以 30,得到下一个 30 分钟的间隔。

DECLARE @minutes int = 30; @minutes int = 30;
DECLARE @Start date = '20200101', @End date = '20200103';DECLARE @Start date = '20200101', @End date = '20200103';
;WITH intervals AS;WITH intervals AS
(
  SELECT TOP (24*60/@minutes) rn = ROW_NUMBER() OVER (ORDER BY TheDate) SELECT TOP (24*60/@minutes) rn = ROW_NUMBER() OVER (ORDER BY TheDate) 
    FROM Calendar.dbo.CalendarFROM Calendar.dbo.Calendar
),
ranges AS AS 
(
  SELECT TheWindow = DATEADD(MINUTE, (i.rn-1)*@minutes, CONVERT(datetime2(3), f.TheDate))SELECT TheWindow = DATEADD(MINUTE, (i.rn-1)*@minutes, CONVERT(datetime2(3), f.TheDate))
    FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS fFROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f
    CROSS JOIN intervals AS iCROSS JOIN intervals AS i
)
SELECT r.TheWindow, COUNT(t.KeyColumn) r.TheWindow, COUNT(t.KeyColumn)
  FROM ranges AS rFROM ranges AS r
  LEFT OUTER JOIN SalesStuff.dbo.Transactions AS tLEFT OUTER JOIN SalesStuff.dbo.Transactions AS t
    ON t.EventTime >= r.TheWindowON t.EventTime >= r.TheWindow
   AND t.EventTime < DATEADD(MINUTE, @minutes, r.TheWindow)AND t.EventTime < DATEADD(MINUTE, @minutes, r.TheWindow)
  GROUP BY r.TheWindowGROUP BY r.TheWindow
  ORDER BY r.TheWindow;
ORDER BY r.TheWindow;

精简结果:

TheWindow          TransactionCount
----------------   ----------------
2020-01-01 00:00   0
2020-01-01 00:30   0
2020-01-01 01:00   0
2020-01-01 01:30   0
2020-01-01 02:00   1
…
2020-01-03 21:30   0
2020-01-03 22:00   1
2020-01-03 22:30   0
2020-01-03 23:00   0
2020-01-03 23:30   0

此查询的计划如下所示(使用 MAXDOP 1):

004.png

虽然日历表很小,但每天扫描 48 行有点浪费,所以请随意允许并行或使用任何您可能已经喜欢的技术来高效地生成数字系列(这组帖子来自 Itzik Ben - 甘是优秀的)。

Identifying Date Gaps and Islands 上述查询有助于消除间隙,但如果我们只想突出显示间隙或孤岛怎么办?查询变得有点复杂,但我使用了一种DENSE_RANK()非常成功的 模式,并且拥有一个日历表(以及一个从该表中提取的函数)有助于保持查询逻辑的可管理性。

在上面的交易数据中,我们已经确定了存在差距,但让我们通过删除两天的数据来使数据更加稀疏:

DELETE SalesStuff.dbo.Transactions  SalesStuff.dbo.Transactions 
  WHERE EventTime >= '20200104'WHERE EventTime >= '20200104'
    AND EventTime <  '20200106';
AND EventTime <  '20200106';

现在我们应该对看起来像这样的数据进行分组(并且我正在分配一个标签,用于描述每组行在存在和不存在之间切换时的描述方式):

TheDate      TransactionCount   
----------   ----------------   
2020-01-01   8                  -- island #1
2020-01-02   0                  -- gap #1
2020-01-03   0                  -- gap #1
2020-01-04   0                  -- gap #1
2020-01-05   1                  -- island #2
2020-01-06   9                  -- island #2
2020-01-07   0                  -- gap #2
2020-01-08   7                  -- island #3

现在,我经常用来同时返回间隙和岛屿的查询(并且能够准确生成先前的结果集)如下所示:

DECLARE @Start date = '20200101', @Start date = '20200101',
        @End   date = '20200108';End   date = '20200108';
;WITH src AS ;WITH src AS 
(
  SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn)SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn)
    FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS fFROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f
    LEFT OUTER JOIN SalesStuff.dbo.Transactions AS tLEFT OUTER JOIN SalesStuff.dbo.Transactions AS t
      ON t.EventTime >= f.TheDateON t.EventTime >= f.TheDate
     AND t.EventTime <  DATEADD(DAY, 1, f.TheDate)AND t.EventTime <  DATEADD(DAY, 1, f.TheDate)
    GROUP BY f.TheDateGROUP BY f.TheDate
),
gaps ASAS
(
  SELECT TheDate, SELECT TheDate, 
         TransactionCount, , 
         gap = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)= DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)
  FROM srcFROM src
  WHERE TransactionCount = 0WHERE TransactionCount = 0
),
islands ASAS
(
  SELECT TheDate, SELECT TheDate, 
         TransactionCount, , 
         island = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate) = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate) 
  FROM src WHERE TransactionCount > 0FROM src WHERE TransactionCount > 0
),
rawdata(TheDate, TransactionCount, Label) AS(TheDate, TransactionCount, Label) AS
(
  SELECT TheDate, SELECT TheDate, 
         TransactionCount, , 
         Label = 'gap ' + RTRIM(DENSE_RANK() OVER (ORDER BY gap)) = 'gap ' + RTRIM(DENSE_RANK() OVER (ORDER BY gap)) 
  FROM gapsFROM gaps
  UNION ALLUNION ALL
  SELECT TheDate,SELECT TheDate,
         TransactionCount, , 
         Label = 'island ' + RTRIM(DENSE_RANK() OVER (ORDER BY island))= 'island ' + RTRIM(DENSE_RANK() OVER (ORDER BY island))
  FROM islandsFROM islands
)
SELECT TheDate, TheDate,
       Label,,
       TransactionCount
FROM rawdata  rawdata 
ORDER BY TheDate;
 BY TheDate;

这将返回以下结果:

TheDate      Label      TransactionCount
----------   --------   ----------------
2020-01-01   island 1   8
2020-01-02   gap 1      0
2020-01-03   gap 1      0
2020-01-04   gap 1      0
2020-01-05   island 2   1
2020-01-06   island 2   9
2020-01-07   gap 2      0
2020-01-08   island 3   7

如果我只想找出差距,我可以从之前的分组查询中获取,然后应用于DENSE_RANK()结果:

DECLARE @Start date = '20200101', @Start date = '20200101',
        @End   date = '20200108';End   date = '20200108';
;WITH src AS ;WITH src AS 
(
  SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn)SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn)
    FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS fFROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f
    LEFT OUTER JOIN SalesStuff.dbo.Transactions AS tLEFT OUTER JOIN SalesStuff.dbo.Transactions AS t
      ON t.EventTime >= f.TheDateON t.EventTime >= f.TheDate
     AND t.EventTime <  DATEADD(DAY, 1, f.TheDate)AND t.EventTime <  DATEADD(DAY, 1, f.TheDate)
    GROUP BY f.TheDateGROUP BY f.TheDate
),
gaps ASAS
(
  SELECT TheDate, SELECT TheDate, 
         gap = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)= DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)
  FROM srcFROM src
  WHERE TransactionCount = 0WHERE TransactionCount = 0
)
SELECT TheDate,  TheDate, 
       [Gap #] = DENSE_RANK() OVER (ORDER BY gap) [Gap #] = DENSE_RANK() OVER (ORDER BY gap) 
  FROM gapsFROM gaps
  ORDER BY TheDate;
ORDER BY TheDate;

结果:

TheDate Gap #


2020-01-02 1 2020-01-03 1 2020-01-04 1 2020-01-07 2 同样,如果我想识别岛屿,我可以翻转第二个 CTE 中的 where 子句:

DECLARE @Start date = '20200101', @Start date = '20200101',
        @End   date = '20200108';End   date = '20200108';
;WITH src AS ;WITH src AS 
(
  SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn)SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn)
    FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS fFROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f
    LEFT OUTER JOIN SalesStuff.dbo.Transactions AS tLEFT OUTER JOIN SalesStuff.dbo.Transactions AS t
      ON t.EventTime >= f.TheDateON t.EventTime >= f.TheDate
     AND t.EventTime <  DATEADD(DAY, 1, f.TheDate)AND t.EventTime <  DATEADD(DAY, 1, f.TheDate)
    GROUP BY f.TheDateGROUP BY f.TheDate
),
islands ASAS
(
  SELECT TheDate, SELECT TheDate, 
         island = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)= DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)
  FROM srcFROM src
  WHERE TransactionCount > 0WHERE TransactionCount > 0
)
SELECT TheDate,  TheDate, 
       [Island #] = DENSE_RANK() OVER (ORDER BY island) [Island #] = DENSE_RANK() OVER (ORDER BY island) 
  FROM islandsFROM islands
  ORDER BY TheDate;
ORDER BY TheDate;

结果:

TheDate      Island #
----------   --------
2020-01-01   1
2020-01-05   2
2020-01-06   2
2020-01-08   3

如果我想将间隙和孤岛组合在一起,在一行中显示每个集合,甚至突出显示每个连续的连续和交易计数,我可以进一步分组如下:

DECLARE @Start date = '20200101', @Start date = '20200101',
        @End   date = '20200108';End   date = '20200108';
;WITH src AS ;WITH src AS 
(
  SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn)SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn)
    FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS fFROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f
    LEFT OUTER JOIN SalesStuff.dbo.Transactions AS tLEFT OUTER JOIN SalesStuff.dbo.Transactions AS t
      ON t.EventTime >= f.TheDateON t.EventTime >= f.TheDate
     AND t.EventTime <  DATEADD(DAY, 1, f.TheDate)AND t.EventTime <  DATEADD(DAY, 1, f.TheDate)
    GROUP BY f.TheDateGROUP BY f.TheDate
),
gaps ASAS
(
  SELECT TheDate, SELECT TheDate, 
         TransactionCount, , 
         gap = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)= DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate)
  FROM srcFROM src
  WHERE TransactionCount = 0WHERE TransactionCount = 0
),
islands ASAS
(
  SELECT TheDate, SELECT TheDate, 
         TransactionCount, , 
         island = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate) = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate) 
  FROM src WHERE TransactionCount > 0FROM src WHERE TransactionCount > 0
),
rawdata(TheDate, TransactionCount, Label) AS(TheDate, TransactionCount, Label) AS
(
  SELECT TheDate, SELECT TheDate, 
         TransactionCount, , 
         Label = 'gap ' + RTRIM(DENSE_RANK() OVER (ORDER BY gap)) = 'gap ' + RTRIM(DENSE_RANK() OVER (ORDER BY gap)) 
  FROM gapsFROM gaps
  UNION ALLUNION ALL
  SELECT TheDate,SELECT TheDate,
         TransactionCount, , 
         Label = 'island ' + RTRIM(DENSE_RANK() OVER (ORDER BY island))= 'island ' + RTRIM(DENSE_RANK() OVER (ORDER BY island))
  FROM islandsFROM islands
)
SELECT WindowStart      = MIN(TheDate),  WindowStart      = MIN(TheDate), 
       WindowEnd        = MAX(TheDate), = MAX(TheDate), 
       TransactionCount = SUM(TransactionCount), = SUM(TransactionCount), 
       DayStreak        = COUNT(*)= COUNT(*)
FROM rawdata  rawdata 
GROUP BY label BY label
ORDER BY WindowStart;
 BY WindowStart;

结果:

WindowStart   WindowEnd    TransactionCount   DayStreak
-----------   ----------   ----------------   ---------
2020-01-01    2020-01-01   8                  1
2020-01-02    2020-01-04   0                  3
2020-01-05    2020-01-06   10                 2
2020-01-07    2020-01-07   0                  1
2020-01-08    2020-01-08   7                  1

如果标签提供比TransactionCount单独提供更好的视觉提示,您可以在输出中包含标签 。

当然,您可能需要解决更多奇特的差距和岛屿问题,并且该主题有很多很好的处理方法,包括 本网站上的此处。

结论 在本系列的前两部分中,我展示了几个示例,其中日历表可以帮助解决或至少简化与日期相关的常见业务问题。在下一部分中,我将演示如何使用日历表来配置和优化计划、预测冲突以及处理其他复杂的计划需求。


原文链接:https://codingdict.com/