这是Google BigQuery中多级数据透视表的后续问题,我想知道是否可以使用单个查询在GoogleBigQuery中构造嵌套数据透视表。是的,因此在这个后续问题中,我想探讨一下一般情况。
这是我正在使用的数据的示例(此共享Google表格中也包含该数据)
现在,我想构建一个具有以下属性的数据透视表:
这是Google表格中内置的枢轴-
这里的概念性SQL语句为:
SELECT SUM(price), COUNT(price) BROKEN DOWN BY Studio (row), Title (row) Territory ID (col), Type (col) SORTED/LIMITED BY Studio ==> A-Z, LIMIT 3, Title ==> SUM(price) in GRAND TOTAL DESC, LIMIT 4, Territory ID ==> COUNT(price) in Paramount TOTAL, LIMIT 2 Type ==> A-Z, NO LIMIT
我不确定如何在概念上显示小计,但我们应该能够为每个细分字段指定小计。
是否可以在Google BigQuery中的单个SQL语句中完成上述操作?生成它的步骤是什么?
Q 。如果我们进行汇总并获得1000万个结果怎么办?除非我们在bigquery中应用限制等-否则传输的数据量将需要大量的数据。
让我们在这里阐明挑战:
因此,通常,您将在后端运行以下内容,并将结果上载到可视化工具(前端),以进行进一步的操作,例如排序,限制,旋转等。
#standardSQL SELECT Studio, Title, TerritoryID, Type, SUM(Price) AS Price, COUNT(1) AS Volume FROM YourTable GROUP BY Studio, Title, TerritoryID, Type
如您所提到的,这种情况下的结果很容易产生1000万以上的行,并且 您希望减小其大小,而又不影响在前端数据透视/可视化中仍然呈现最终数据的能力
一 。推荐/解决方案
下面显示了如何通过在后端应用排序和限制(从而大大减小结果大小)而没有丢失进行透视的能力并仍然显示总数等来实现此目的。
让我们以简化的一词开始进行最终查询
假设基于已知标准,我们预先知道应该选择哪些工作室,标题,地区和类型。 在这种情况下,下面的查询将返回所需的数据
#standardSQL WITH Studios AS ( SELECT 'Fox' UNION ALL SELECT 'Paramouont' ), Titles AS ( SELECT 'Fox' AS Studio,'Best Laid Plans' AS Title UNION ALL SELECT 'Fox','Homecoming' UNION ALL SELECT 'Paramount','Titanic' UNION ALL SELECT 'Paramount','Homecoming' ), Territories AS ( SELECT 'US' AS TerritoryID UNION ALL SELECT 'GB' ), Totals AS ( SELECT IFNULL(b.Studio,'Other') AS Studio, IFNULL(b.Title,'Other') AS Title, IFNULL(c.TerritoryID,'Other') AS TerritoryID, Type, ROUND(SUM(Price), 2) AS Price, COUNT(1) AS Volume FROM yourTable AS a LEFT JOIN Titles AS b ON a.Studio = b.Studio AND a.Title = b.Title LEFT JOIN Territories AS c ON a.TerritoryID = c.TerritoryID GROUP BY Studio, Title, TerritoryID, Type ) SELECT * FROM Totals ORDER BY Studio, Title, TerritoryID, Type
输出将如下所示
Studio Title TerritoryID Type Price Volume Fox Best Laid Plans GB Movie 87.32 18 Fox Best Laid Plans GB TV Episode 50.17 23 Fox Best Laid Plans Other TV Episode 1131.0 2 Fox Best Laid Plans US Movie 120.82 18 Fox Best Laid Plans US TV Episode 53.76 24 Fox Homecoming GB TV Episode 60.22 28 Fox Homecoming Other TV Episode 2262.0 4 Fox Homecoming US TV Episode 128.45 58 Other Other GB Movie 142.71 29 Other Other GB TV Episode 84.8 40 Other Other Other Movie 3292.0 4 Other Other Other TV Episode 3282.0 16 Other Other US Movie 52.92 8 Other Other US TV Episode 233.05 101 Paramount Homecoming GB Movie 18.96 4 Paramount Homecoming US Movie 124.84 16 Paramount Titanic GB Movie 41.92 8 Paramount Titanic Other Movie 12.0 4 Paramount Titanic US Movie 139.84 16
您可以轻松地将其反馈到用户界面,以任何需要的方式对其进行可视化
现在,让我们为每个维度实施实际的标准,而不是在所有涉及的维度中使用硬编码的值。 因此,以下查询(相对于骨架查询)的唯一变化是以下CTE:工作室,标题和地区
#standardSQL WITH Studios AS ( SELECT DISTINCT Studio FROM yourTable ORDER BY Studio LIMIT 3 ), Titles AS ( SELECT Studio, Title FROM ( SELECT Studio, Title, ROW_NUMBER() OVER(PARTITION BY Studio ORDER BY PRICE DESC) AS pos FROM (SELECT Studio, Title, SUM(Price) AS Price FROM yourTable GROUP BY Studio, Title) ) WHERE pos <= 4 ), Territories AS ( SELECT TerritoryID FROM yourTable WHERE Studio = 'Paramount' GROUP BY TerritoryID ORDER BY COUNT(1) DESC LIMIT 2 ), Totals AS ( SELECT IFNULL(b.Studio,'Other') AS Studio, IFNULL(b.Title,'Other') AS Title, IFNULL(c.TerritoryID,'Other') AS TerritoryID, Type, ROUND(SUM(Price), 2) AS Price, COUNT(1) AS Volume FROM yourTable AS a LEFT JOIN Titles AS b ON a.Studio = b.Studio AND a.Title = b.Title LEFT JOIN Territories AS c ON a.TerritoryID = c.TerritoryID GROUP BY Studio, Title, TerritoryID, Type ) SELECT * FROM Totals WHERE NOT 'Other' IN (TerritoryID) ORDER BY Studio, TerritoryID DESC, Type, Price DESC, Title
结果是:
Studio Title TerritoryID Type Price Volume Fox Best Laid Plans US Movie 120.82 18 Fox Titanic US Movie 52.92 8 Fox 1:00 P.M. - 2:00 P.M. US TV Episode 187.25 81 Fox Homecoming US TV Episode 128.45 58 Fox Best Laid Plans US TV Episode 53.76 24 Fox Best Laid Plans GB Movie 87.32 18 Fox Titanic GB Movie 78.84 16 Fox 1:00 P.M. - 2:00 P.M. GB TV Episode 61.42 28 Fox Homecoming GB TV Episode 60.22 28 Fox Best Laid Plans GB TV Episode 50.17 23 Paramount Titanic US Movie 139.84 16 Paramount Homecoming US Movie 124.84 16 Paramount Titanic GB Movie 41.92 8 Paramount Homecoming GB Movie 18.96 4 Sony Best Laid Plans US TV Episode 22.9 10 Sony Homecoming US TV Episode 22.9 10 Sony Best Laid Plans GB Movie 63.87 13 Sony Homecoming GB TV Episode 18.81 9 Sony Best Laid Plans GB TV Episode 4.57 3
这里的重点是 -尽管BigQuery在分析数十亿行和提取所需信息方面非常高效,但是使用BigQuery实际定制结果数据以反映该结果将如何在客户端UI的表示层中实际呈现是非常无效的。相反,您应该将这些数据传递给UI并使用可视化代码进行处理