我发现了一个类似的问题(重复记录以填补Google BigQuery中日期之间的空白),但是存在不同的情况,答案不适用。
我的数据结构如下(基本上是多个产品和合作伙伴的价格变动历史记录):
+------------+---------+---------+-------+ | date | product | partner | value | +------------+---------+---------+-------+ | 2017-01-01 | a | x | 10 | | 2017-01-01 | b | x | 15 | | 2017-01-01 | a | y | 11 | | 2017-01-01 | b | y | 16 | | 2017-01-05 | b | x | 13 | | 2017-01-07 | a | y | 15 | | 2017-01-07 | a | x | 15 | +------------+---------+---------+-------+
我需要的是一个查询(特别是用BigQuery Standard SQL编写),该查询在给定日期范围(在这种情况下2017-01-01为2017-01-10)下输出以下结果:
2017-01-01
2017-01-10
+--------------+---------+---------+-------+ | date | product | partner | value | +--------------+---------+---------+-------+ | 2017-01-01 | a | x | 10 | | 2017-01-02 | a | x | 10 | | 2017-01-03 | a | x | 10 | | 2017-01-04 | a | x | 10 | | 2017-01-05 | a | x | 10 | | 2017-01-06 | a | x | 10 | | 2017-01-07 | a | x | 15 | | 2017-01-08 | a | x | 15 | | 2017-01-09 | a | x | 15 | | 2017-01-10 | a | x | 15 | | 2017-01-01 | a | y | 11 | | 2017-01-02 | a | y | 11 | | 2017-01-03 | a | y | 11 | | 2017-01-04 | a | y | 11 | | 2017-01-05 | a | y | 11 | | 2017-01-06 | a | y | 11 | | 2017-01-07 | a | y | 15 | | 2017-01-08 | a | y | 15 | | 2017-01-09 | a | y | 15 | | 2017-01-10 | a | y | 15 | | 2017-01-01 | b | x | 15 | | 2017-01-02 | b | x | 15 | | 2017-01-03 | b | x | 15 | | 2017-01-04 | b | x | 15 | | 2017-01-05 | b | x | 13 | | 2017-01-06 | b | x | 13 | | 2017-01-07 | b | x | 13 | | 2017-01-08 | b | x | 13 | | 2017-01-09 | b | x | 13 | | 2017-01-10 | b | x | 13 | | 2017-01-01 | b | y | 16 | | 2017-01-02 | b | y | 16 | | 2017-01-03 | b | y | 16 | | 2017-01-04 | b | y | 16 | | 2017-01-05 | b | y | 16 | | 2017-01-06 | b | y | 16 | | 2017-01-07 | b | y | 16 | | 2017-01-08 | b | y | 16 | | 2017-01-09 | b | y | 16 | | 2017-01-10 | b | y | 16 | +--------------+---------+---------+-------+
对于产品和合作伙伴的每种组合,基本上都是价格历史记录,其中包含所有日期差。
我很难弄清楚如何完成这项工作,尤其是如何在同一天没有价格变动的情况下生成多行。有任何想法吗?
尝试下面
#standardSQL WITH history AS ( SELECT '2017-01-01' AS d, 'a' AS product, 'x' AS partner, 10 AS value UNION ALL SELECT '2017-01-01' AS d, 'b' AS product, 'x' AS partner, 15 AS value UNION ALL SELECT '2017-01-01' AS d, 'a' AS product, 'y' AS partner, 11 AS value UNION ALL SELECT '2017-01-01' AS d, 'b' AS product, 'y' AS partner, 16 AS value UNION ALL SELECT '2017-01-05' AS d, 'b' AS product, 'x' AS partner, 13 AS value UNION ALL SELECT '2017-01-07' AS d, 'a' AS product, 'y' AS partner, 15 AS value UNION ALL SELECT '2017-01-07' AS d, 'a' AS product, 'x' AS partner, 15 AS value ), daterange AS ( SELECT date_in_range FROM UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-01-10')) AS date_in_range ), temp AS ( SELECT d, product, partner, value, LEAD(d) OVER(PARTITION BY product, partner ORDER BY d) AS next_d FROM history ORDER BY product, partner, d ) SELECT date_in_range, product, partner, value FROM daterange JOIN temp ON daterange.date_in_range >= PARSE_DATE('%Y-%m-%d', temp.d) AND (daterange.date_in_range < PARSE_DATE('%Y-%m-%d', temp.next_d) OR temp.next_d IS NULL) ORDER BY product, partner, date_in_range