我想出了一些可行的方法,但并非完全符合我的期望。这是我的解决方案:
#standardSQL UPDATE `attribution.daily_sessions_20180301_copy1` AS target SET hits = ARRAY( SELECT AS STRUCT * REPLACE(ARRAY( SELECT AS STRUCT * FROM( SELECT AS STRUCT * REPLACE(map.category AS productCategoryAttribute) FROM UNNEST(product))) AS product) FROM UNNEST(hits) ) FROM `attribution.attribute_category_map` AS map WHERE ( SELECT REPLACE(LOWER(prod.productCategory), 'amp;', '') FROM UNNEST(target.hits) AS h, UNNEST(h.product) AS prod LIMIT 1) = map.raw_name
attribute_category_map是一个具有两列的表,我在其中查找第1列中的对应值,并用第2列中的值替换目标表中的数据。我实现的最佳结果- 用相同的值更新了一行中的所有嵌套字段,这仅适用于第一个嵌套字段,而不是使用特定值更新每个嵌套字段。
主表的简化架构:
[ { "name":"sessionId", "type":"STRING", "mode":"NULLABLE" }, { "name":"hits", "type":"RECORD", "mode":"REPEATED", "fields":[ { "name":"product", "type":"RECORD", "mode":"REPEATED", "fields":[ { "name":"productCategory", "type":"STRING", "mode":"NULLABLE" }, { "name":"productCategoryAttribute", "type":"STRING", "mode":"NULLABLE" } ] } ] } ]
会话行中通常有多个匹配项,一个匹配项中通常包含多个产品。看起来像那些值(如果您嵌套):
----------------------------------------------------------------------------- sessionId | hits.product.productCategory| hit.product.productCategoryAttribute ----------------------------------------------------------------------------- 1 | automotive chemicals | null 1 | automotive tools | null 1 | null | null 2 | null | null 2 | automotive chemicals | null 2 | null | null 3 | null | null 3 | bed accessories | null 4 | null | null 4 | null | null 4 | automotive chemicals | null 4 | null | null -----------------------------------------------------------------------------
映射表的架构:
[ { "name":"raw_name", "type":"STRING", "mode":"NULLABLE" }, { "name":"category", "type":"STRING", "mode":"NULLABLE" } ]
具有这样的值:
--------------------------------------------------- raw_name |category | --------------------------------------------------- automotive chemicals |d1y2 - automotive chemicals| automotive paint |dijf1 - automotive paint | automotive tools |efw1 - automotive tools | baby & infant toys |wwfw - baby & infant toys | batteries & power |fdsv- batteries & power | bed accessories |0k77 - bed accessories | bike racks |12df - bike racks | --------------------------------------------------
结果是我想要什么:
----------------------------------------------------------------------------- sessionId | hits.product.productCategory| hit.product.productCategoryAttribute ----------------------------------------------------------------------------- 1 | automotive chemicals | d1y2 - automotive chemicals 1 | automotive tools | efw1 - automotive tools 1 | null | null 2 | null | null 2 | automotive chemicals | d1y2 - automotive chemicals 2 | null | null 3 | null | null 3 | bed accessories | 0k77 - bed accessories 4 | null | null 4 | null | null 4 | automotive chemicals | d1y2 - automotive chemicals 4 | null | null -----------------------------------------------------------------------------
我需要从主表中获取值productCategory,在map表中的raw_name列中查找它,从colum类别中获取值,并将其放入主表的productCategoryAttribute列中。主要问题是目标字段是双重嵌套的,我不知道如何直接加入它们。
下面经过测试! 保留整个表的架构/数据不变,并且仅根据各自的映射更新productCategoryAttribute的值
#standardSQL UPDATE `project.dataset.your_table` t SET hits = ARRAY( SELECT AS STRUCT * REPLACE( ARRAY( SELECT AS STRUCT product.* REPLACE( CASE WHEN map.raw_name = product.productCategory THEN category ELSE productCategoryAttribute END AS productCategoryAttribute) FROM UNNEST(product) product LEFT JOIN UNNEST(agg_map.map) map ON map.raw_name = product.productCategory ) AS product) FROM UNNEST(hits) hit ) FROM (SELECT ARRAY_AGG(row) map FROM `project.dataset.map` row) agg_map WHERE TRUE
注意:以上解决方案假定映射表不是那么大,因为它依赖于将整个映射表聚合到一个数组中