问题 在我之前的文章中,我 重新审视了日历表的概念,并解释了一些使用此数据进行营业日期计算的方法。这一次,我想探索如何使用日历表来简化生成日期范围,以及一些可以帮助您简化的查询挑战。
解决方案 回想一下我们的日历表有一个名为 的键列 TheDate:
CREATE TABLE dbo.Calendar ( TheDate date NOT NULL, … CONSTRAINT PK_Calendar PRIMARY KEY (TheDate), …
要从该表生成一系列日期,我们可以使用一个封闭范围,如下所示:
DECLARE @Start date = '20200101', @End date = '20200105'; SELECT TheDate FROM dbo.Calendar WHERE TheDate >= @Start AND TheDate <= @End;
Results:
TheDate ---------- 2020-01-01 2020-01-02 2020-01-03 2020-01-04 2020-01-05
该查询的计划与您想象的一样简单:
不过,为了方便起见,我们可以将其封装在一个内联表值函数中:
CREATE FUNCTION dbo.GenerateDateSeries ( @StartDate date, @EndDate date ) RETURNS TABLE WITH SCHEMABINDING AS RETURN ( SELECT TheDate FROM dbo.Calendar WHERE TheDate >= @StartDate AND TheDate <= @EndDate ); GO
现在,相同的查询更易于生成(单独生成或与其他查询组合):
DECLARE @Start date = '20200101', @End date = '20200105'; SELECT TheDate FROM dbo.GenerateDateSeries (@Start, @End);
相同的结果,相同的计划,相同的性能。
但是我们还能如何使用这个功能呢?
填补日期空白 生成日期范围的更常见原因之一是填补报告中的空白。让我们在名为的数据库中创建一个事务表, SalesStuff并用包含故意间隙的虚构数据填充它:
USE SalesStuff; GO DROP TABLE IF EXISTS dbo.Transactions; GO CREATE TABLE dbo.Transactions ( KeyColumn int NOT NULL, EventTime datetime2(3), CONSTRAINT PK_Transactions PRIMARY KEY(KeyColumn), INDEX IX_Transactions_EventTime(EventTime) ); GO ;WITH x(n) AS ( SELECT TOP (40000) ROW_NUMBER() OVER (ORDER BY o.object_id) FROM master.sys.all_objects AS o CROSS JOIN master.sys.all_objects AS o2 ) INSERT dbo.Transactions(KeyColumn, EventTime) SELECT TOP (10000) n, TheDate = DATEADD(HOUR, n*2, '20200101') FROM x WHERE ((n+1)/10) % 3 = 0 ORDER BY n;
如果我们只是手动查看前 8 天的事务表,我们会看到有些天缺少行:
SELECT KeyColumn, EventTime FROM dbo.Transactions WHERE EventTime < '20200109';
Abridged results:
KeyColumn EventTime --------- ---------------- 1 2020-01-01 02:00 … 8 2020-01-01 16:00 29 2020-01-03 10:00 … 35 2020-01-03 22:00 36 2020-01-04 00:00 37 2020-01-04 02:00 38 2020-01-04 04:00 59 2020-01-05 22:00 60 2020-01-06 00:00 … 67 2020-01-06 14:00 68 2020-01-06 16:00 89 2020-01-08 10:00 …
你可以看到,是1月2日没有数据第二 或1月7日。但是,如果有人要运行一个简单的报告来获得这 8 天的每日销售额:
DECLARE @Start date = '20200101', @End date = '20200108'; SELECT TheDate = CONVERT(date, EventDateTime), TransactionCount = COUNT(*) FROM SalesStuff.dbo.Transactions WHERE EventDateTime >= @Start AND EventDateTime < DATEADD(DAY, 1, @End) GROUP BY CONVERT(date, EventDateTime) ORDER BY TheDate;
结果将只有 6 行,因为如果不存在,则该查询无法表示天数:
TheDate TransactionCount ---------- ---------------- 2020-01-01 8 2020-01-03 7 2020-01-04 3 2020-01-05 1 2020-01-06 9 2020-01-08 7
这是该查询的计划,仅供参考。注意排序运算符;即使索引查找EventTime按顺序返回列,转换也需要进行后续排序:
如果我们希望即使没有销售也能显示所有天数,我们需要找到其他方法来填补空白。这是日期序列生成器可以派上用场的地方——我们可以从日期序列函数中提取所有日期,然后对交易数据执行左连接。
DECLARE @Start date = '20200101', @End date = '20200108'; SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn) FROM dbo.GenerateDateSeries(@Start, @End) AS f LEFT OUTER JOIN dbo.Transactions AS t ON t.EventTime >= f.TheDate AND t.EventTime < DATEADD(DAY, 1, f.TheDate) GROUP BY f.TheDate ORDER BY f.TheDate;
结果显示所有 8 行,其中不存在的日期由 0 表示:
TheDate TransactionCount ---------- ---------------- 2020-01-01 8 2020-01-02 0 2020-01-03 7 2020-01-04 3 2020-01-05 1 2020-01-06 9 2020-01-07 0 2020-01-08 7
可以理解,这里的计划更复杂,并且估计子树成本更高,但这仍然相当有效 - 请注意不再有排序运算符:
对于窄或宽的日期范围,成本变化很小,但总体成本可能会根据交易表数据的大小、统计信息和分布而有所不同。
您还可以使用一天或几天的范围生成更细粒度的系列,无需任何其他对象,从而轻松生成无间隙的每小时报告。假设,对于 1 月的第一天,我们想要一个全天每半小时有一行的报告。我们可以先从日历表中生成 48 个行号,然后将每个行号乘以 30,得到下一个 30 分钟的间隔。
DECLARE @minutes int = 30; DECLARE @Start date = '20200101', @End date = '20200103'; ;WITH intervals AS ( SELECT TOP (24*60/@minutes) rn = ROW_NUMBER() OVER (ORDER BY TheDate) FROM Calendar.dbo.Calendar ), ranges AS ( SELECT TheWindow = DATEADD(MINUTE, (i.rn-1)*@minutes, CONVERT(datetime2(3), f.TheDate)) FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f CROSS JOIN intervals AS i ) SELECT r.TheWindow, COUNT(t.KeyColumn) FROM ranges AS r LEFT OUTER JOIN SalesStuff.dbo.Transactions AS t ON t.EventTime >= r.TheWindow AND t.EventTime < DATEADD(MINUTE, @minutes, r.TheWindow) GROUP BY r.TheWindow ORDER BY r.TheWindow;
TheWindow TransactionCount ---------------- ---------------- 2020-01-01 00:00 0 2020-01-01 00:30 0 2020-01-01 01:00 0 2020-01-01 01:30 0 2020-01-01 02:00 1 … 2020-01-03 21:30 0 2020-01-03 22:00 1 2020-01-03 22:30 0 2020-01-03 23:00 0 2020-01-03 23:30 0
此查询的计划如下所示(使用 MAXDOP 1):
虽然日历表很小,但每天扫描 48 行有点浪费,所以请随意允许并行或使用任何您可能已经喜欢的技术来高效地生成数字系列(这组帖子来自 Itzik Ben - 甘是优秀的)。
确定日期差距和岛屿 上述查询有助于消除间隙,但如果我们只想突出显示间隙或孤岛怎么办?查询变得有点复杂,但我使用了一种DENSE_RANK()非常成功的 模式,并且拥有一个日历表(以及一个从该表中提取的函数)有助于保持查询逻辑的可管理性。
在上面的交易数据中,我们已经确定了存在差距,但让我们通过删除两天的数据来使数据更加稀疏:
DELETE SalesStuff.dbo.Transactions WHERE EventTime >= '20200104' AND EventTime < '20200106';
现在我们应该对看起来像这样的数据进行分组(并且我正在分配一个标签,用于描述每组行在存在和不存在之间切换时的描述方式):
TheDate TransactionCount ---------- ---------------- 2020-01-01 8 -- island #1 2020-01-02 0 -- gap #1 2020-01-03 0 -- gap #1 2020-01-04 0 -- gap #1 2020-01-05 1 -- island #2 2020-01-06 9 -- island #2 2020-01-07 0 -- gap #2 2020-01-08 7 -- island #3
现在,我经常用来同时返回间隙和岛屿的查询(并且能够准确生成先前的结果集)如下所示:
DECLARE @Start date = '20200101', @End date = '20200108'; ;WITH src AS ( SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn) FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f LEFT OUTER JOIN SalesStuff.dbo.Transactions AS t ON t.EventTime >= f.TheDate AND t.EventTime < DATEADD(DAY, 1, f.TheDate) GROUP BY f.TheDate ), gaps AS ( SELECT TheDate, TransactionCount, gap = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate) FROM src WHERE TransactionCount = 0 ), islands AS ( SELECT TheDate, TransactionCount, island = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate) FROM src WHERE TransactionCount > 0 ), rawdata(TheDate, TransactionCount, Label) AS ( SELECT TheDate, TransactionCount, Label = 'gap ' + RTRIM(DENSE_RANK() OVER (ORDER BY gap)) FROM gaps UNION ALL SELECT TheDate, TransactionCount, Label = 'island ' + RTRIM(DENSE_RANK() OVER (ORDER BY island)) FROM islands ) SELECT TheDate, Label, TransactionCount FROM rawdata ORDER BY TheDate;
这将返回以下结果:
TheDate Label TransactionCount ---------- -------- ---------------- 2020-01-01 island 1 8 2020-01-02 gap 1 0 2020-01-03 gap 1 0 2020-01-04 gap 1 0 2020-01-05 island 2 1 2020-01-06 island 2 9 2020-01-07 gap 2 0 2020-01-08 island 3 7
如果我只想找出差距,我可以从之前的分组查询中获取,然后应用于DENSE_RANK()结果:
DECLARE @Start date = '20200101', @End date = '20200108'; ;WITH src AS ( SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn) FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f LEFT OUTER JOIN SalesStuff.dbo.Transactions AS t ON t.EventTime >= f.TheDate AND t.EventTime < DATEADD(DAY, 1, f.TheDate) GROUP BY f.TheDate ), gaps AS ( SELECT TheDate, gap = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate) FROM src WHERE TransactionCount = 0 ) SELECT TheDate, [Gap #] = DENSE_RANK() OVER (ORDER BY gap) FROM gaps ORDER BY TheDate;
TheDate Gap # ---------- ----- 2020-01-02 1 2020-01-03 1 2020-01-04 1 2020-01-07 2
同样,如果我想识别岛屿,我可以翻转第二个 CTE 中的 where 子句:
DECLARE @Start date = '20200101', @End date = '20200108'; ;WITH src AS ( SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn) FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f LEFT OUTER JOIN SalesStuff.dbo.Transactions AS t ON t.EventTime >= f.TheDate AND t.EventTime < DATEADD(DAY, 1, f.TheDate) GROUP BY f.TheDate ), islands AS ( SELECT TheDate, island = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate) FROM src WHERE TransactionCount > 0 ) SELECT TheDate, [Island #] = DENSE_RANK() OVER (ORDER BY island) FROM islands ORDER BY TheDate;
TheDate Island # ---------- -------- 2020-01-01 1 2020-01-05 2 2020-01-06 2 2020-01-08 3
如果我想将间隙和孤岛组合在一起,在一行中显示每个集合,甚至突出显示每个连续的连续和交易计数,我可以进一步分组如下:
DECLARE @Start date = '20200101', @End date = '20200108'; ;WITH src AS ( SELECT f.TheDate, TransactionCount = COUNT(t.KeyColumn) FROM Calendar.dbo.GenerateDateSeries(@Start, @End) AS f LEFT OUTER JOIN SalesStuff.dbo.Transactions AS t ON t.EventTime >= f.TheDate AND t.EventTime < DATEADD(DAY, 1, f.TheDate) GROUP BY f.TheDate ), gaps AS ( SELECT TheDate, TransactionCount, gap = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate) FROM src WHERE TransactionCount = 0 ), islands AS ( SELECT TheDate, TransactionCount, island = DATEADD(DAY, DENSE_RANK() OVER (ORDER BY TheDate) * -1, TheDate) FROM src WHERE TransactionCount > 0 ), rawdata(TheDate, TransactionCount, Label) AS ( SELECT TheDate, TransactionCount, Label = 'gap ' + RTRIM(DENSE_RANK() OVER (ORDER BY gap)) FROM gaps UNION ALL SELECT TheDate, TransactionCount, Label = 'island ' + RTRIM(DENSE_RANK() OVER (ORDER BY island)) FROM islands ) SELECT WindowStart = MIN(TheDate), WindowEnd = MAX(TheDate), TransactionCount = SUM(TransactionCount), DayStreak = COUNT(*) FROM rawdata GROUP BY label ORDER BY WindowStart;
WindowStart WindowEnd TransactionCount DayStreak ----------- ---------- ---------------- --------- 2020-01-01 2020-01-01 8 1 2020-01-02 2020-01-04 0 3 2020-01-05 2020-01-06 10 2 2020-01-07 2020-01-07 0 1 2020-01-08 2020-01-08 7 1
如果标签提供比TransactionCount单独的标签更好的视觉提示,则可以在输出中包含标签 。
当然,您可能需要解决更多的异国情调的鸿沟和孤岛问题,并且对此主题有很多出色的解决方法,包括 本网站的此处。
结论 在本系列的前两部分中,我展示了几个示例,其中日历表可以帮助解决或至少简化与日期相关的常见业务问题。在下一部分中,我将演示如何使用日历表来配置和优化计划、预测冲突以及处理其他复杂的计划需求。
原文链接:https://codingdict.com/