想象一下下表(称为TestTable):
TestTable
id somedate somevalue -- -------- --------- 45 01/Jan/09 3 23 08/Jan/09 5 12 02/Feb/09 0 77 14/Feb/09 7 39 20/Feb/09 34 33 02/Mar/09 6
我想要一个按日期顺序返回运行总计的查询,例如:
id somedate somevalue runningtotal -- -------- --------- ------------ 45 01/Jan/09 3 3 23 08/Jan/09 5 8 12 02/Feb/09 0 8 77 14/Feb/09 7 15 39 20/Feb/09 34 49 33 02/Mar/09 6 55
我知道在SQL Server 2000/2005/2008中可以通过多种方式进行此操作。
我对使用aggregating-set-statement技巧的这种方法特别感兴趣:
INSERT INTO @AnotherTbl(id, somedate, somevalue, runningtotal) SELECT id, somedate, somevalue, null FROM TestTable ORDER BY somedate DECLARE @RunningTotal int SET @RunningTotal = 0 UPDATE @AnotherTbl SET @RunningTotal = runningtotal = @RunningTotal + somevalue FROM @AnotherTbl
…这是非常有效的,但是我听说周围存在一些问题,因为您不一定能保证该UPDATE语句将以正确的顺序处理行。也许我们可以获得有关该问题的明确答案。
UPDATE
但是,人们还有其他建议的方式吗?
编辑:现在使用带有上面的设置和“更新技巧”示例的SqlFiddle
更新 ,如果您正在运行SQL Server2012,请参见:
在SQL Server 2012中,可以将SUM()与OVER()子句一起使用。
select id, somedate, somevalue, sum(somevalue) over(order by somedate rows unbounded preceding) as runningtotal from TestTable
问题是Over子句的SQL Server实现受到一定的限制。
Oracle(和ANSI-SQL)允许您执行以下操作:
SELECT somedate, somevalue, SUM(somevalue) OVER(ORDER BY somedate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal FROM Table
SQL Server无法为您解决此问题。我的直觉告诉我,这是极少数情况下光标最快的情况之一,尽管我必须对大结果进行一些基准测试。
更新技巧很方便,但我觉得它相当脆弱。看来,如果要更新完整表,则它将按主键的顺序进行。因此,如果您将日期设置为升序的主键,则将probably是安全的。但是您依赖于一个未记录的SQL Server实现细节(同样,如果查询最终由两个proc执行,我想知道会发生什么,请参见:MAXDOP):
probably
完整的工作样本:
drop table #t create table #t ( ord int primary key, total int, running_total int) insert #t(ord,total) values (2,20) -- notice the malicious re-ordering insert #t(ord,total) values (1,10) insert #t(ord,total) values (3,10) insert #t(ord,total) values (4,1) declare @total int set @total = 0 update #t set running_total = @total, @total = @total + total select * from #t order by ord ord total running_total ----------- ----------- ------------- 1 10 10 2 20 30 3 10 40 4 1 41
您要求提供基准,这是最低要求。
最快的SAFE方式是游标,它比交叉联接的相关子查询快一个数量级。
绝对最快的方法是UPDATE技巧。我唯一关心的是,我不确定在所有情况下更新都会以线性方式进行。查询中没有明确说明的内容。
底线,对于生产代码,我将使用光标。
测试数据:
create table #t ( ord int primary key, total int, running_total int) set nocount on declare @i int set @i = 0 begin tran while @i < 10000 begin insert #t (ord, total) values (@i, rand() * 100) set @i = @i +1 end commit
测试1:
SELECT ord,total, (SELECT SUM(total) FROM #t b WHERE b.ord <= a.ord) AS b FROM #t a -- CPU 11731, Reads 154934, Duration 11135
测试2:
SELECT a.ord, a.total, SUM(b.total) AS RunningTotal FROM #t a CROSS JOIN #t b WHERE (b.ord <= a.ord) GROUP BY a.ord,a.total ORDER BY a.ord -- CPU 16053, Reads 154935, Duration 4647
测试3:
DECLARE @TotalTable table(ord int primary key, total int, running_total int) DECLARE forward_cursor CURSOR FAST_FORWARD FOR SELECT ord, total FROM #t ORDER BY ord OPEN forward_cursor DECLARE @running_total int, @ord int, @total int SET @running_total = 0 FETCH NEXT FROM forward_cursor INTO @ord, @total WHILE (@@FETCH_STATUS = 0) BEGIN SET @running_total = @running_total + @total INSERT @TotalTable VALUES(@ord, @total, @running_total) FETCH NEXT FROM forward_cursor INTO @ord, @total END CLOSE forward_cursor DEALLOCATE forward_cursor SELECT * FROM @TotalTable -- CPU 359, Reads 30392, Duration 496
测试4:
declare @total int set @total = 0 update #t set running_total = @total, @total = @total + total select * from #t -- CPU 0, Reads 58, Duration 139