不幸的是,在BQ中进行重塑并不像在R中那样容易,并且我无法导出该项目的数据。
这是输入
date country A B C D 20170928 CH 3000.3 121 13 3200 20170929 CH 2800.31 137 23 1614.31
预期产量
date country Metric Value 20170928 CH A 3000.3 20170928 CH B 121 20170928 CH C 13 20170928 CH D 3200 20170929 CH A 2800.31 20170929 CH B 137 20170929 CH C 23 20170929 CH D 1614.31
此外,我的表还有更多的列和行(但我认为将需要大量手册)
以下是适用于BigQuery标准SQL的内容,不需要重复选择,具体取决于列数。它会选择您所拥有的数量并将其转换为指标和值
#standardSQL SELECT DATE, country, metric, SAFE_CAST(value AS FLOAT64) value FROM ( SELECT DATE, country, REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric, REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value FROM `project.dataset.yourtable` t, UNNEST(SPLIT(REGEXP_REPLACE(to_json_string(t), r'{|}', ''))) pair ) WHERE NOT LOWER(metric) IN ('date', 'country')
您可以像在问题中一样使用虚拟数据来测试/玩游戏
#standardSQL WITH `project.dataset.yourtable` AS ( SELECT '20170928' DATE, 'CH' country, 3000.3 A, 121 B, 13 C, 3200 D UNION ALL SELECT '20170929', 'CH', 2800.31, 137, 23, 1614.31 ) SELECT DATE, country, metric, SAFE_CAST(value AS FLOAT64) value FROM ( SELECT DATE, country, REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric, REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value FROM `project.dataset.yourtable` t, UNNEST(SPLIT(REGEXP_REPLACE(to_json_string(t), r'{|}', ''))) pair ) WHERE NOT LOWER(metric) IN ('date', 'country')
结果符合预期
DATE country metric value 20170928 CH A 3000.3 20170928 CH B 121.0 20170928 CH C 13.0 20170928 CH D 3200.0 20170929 CH A 2800.31 20170929 CH B 137.0 20170929 CH C 23.0 20170929 CH D 1614.31