我在BigQuery中有一个具有以下结构的表:
datetime | event | value ========================== 1 | add | 1 ---------+--------+------- 2 | remove | 1 ---------+--------+------- 6 | add | 2 ---------+--------+------- 8 | add | 3 ---------+--------+------- 11 | add | 4 ---------+--------+------- 23 | remove | 3 ---------+--------+-------
我正在尝试构建一个视图,该视图list向包含数组当前状态的每一行添加一列。该数组将永远不会包含重复项。结果应该是:
list
datetime | event | value | list =================================== 1 | add | 1 | [1] ---------+--------+-------+-------- 2 | remove | 1 | [] ---------+--------+-------+-------- 6 | add | 2 | [2] ---------+--------+-------+-------- 8 | add | 3 | [2,3] ---------+--------+-------+-------- 11 | add | 4 | [2,3,4] ---------+--------+-------+-------- 23 | remove | 3 | [2,4] ---------+--------+-------+--------
我尝试使用解析函数,但没有成功。用于数组的API十分有限。我想如果我可以使用递归WITH子句,我会成功的,不幸的是,这在BigQuery中是不可能的。
WITH
我正在使用启用了标准SQL的BigQuery。
以下版本适用于BigQuery标准SQL,仅使用纯SQL(无JS UDF)
#standardSQL WITH `project.dataset.events` AS ( SELECT 1 dt,'add' event,'1' value UNION ALL SELECT 2, 'remove', '1' UNION ALL SELECT 6, 'add', '2' UNION ALL SELECT 8, 'add', '3' UNION ALL SELECT 11, 'add', '4' UNION ALL SELECT 23, 'remove', '3' ), cum AS ( SELECT dt, event, value, SUM(IF(event = 'add', 1, -1)) OVER(PARTITION BY value ORDER BY dt) state FROM `project.dataset.events` ), pre AS ( SELECT a.dt, a.event, a.value, a.state, b.value AS b_value, ARRAY_AGG(b.state ORDER BY b.dt DESC)[SAFE_OFFSET(0)] b_state, MAX(b.dt) b_dt FROM cum a JOIN cum b ON b.dt <= a.dt GROUP BY a.dt, a.event, a.value, a.state, b.value ) SELECT dt, event, value, SPLIT(IFNULL(STRING_AGG(IF(b_state = 1, b_value, NULL) ORDER BY b_dt), '')) list_as_array, CONCAT('[', IFNULL(STRING_AGG(IF(b_state = 1, b_value, NULL) ORDER BY b_dt), ''), ']') list_as_string FROM pre GROUP BY dt, event, value ORDER BY dt
结果是“令人惊讶”:o)与我之前回答/发布的JS UDF版本完全相同
Row dt event value list_as_arr list_as_string 1 1 add 1 1 [1] 2 2 remove 1 [] 3 6 add 2 2 [2] 4 8 add 3 2 [2,3] 3 5 11 add 4 2 [2,3,4] 3 4 6 23 remove 3 2 [2,4] 4
注意:我认为以上可能有点过分设计-但我只是没有时间潜在地完善/优化它-应该是可行的-这要由问题所有者负责