小编典典

BigQuery UPDATE嵌套数组字段

sql

我想出了一些可行的方法,但并非完全符合我的期望。这是我的解决方案:

#standardSQL

UPDATE

  `attribution.daily_sessions_20180301_copy1` AS target

SET

hits = ARRAY(

  SELECT AS STRUCT * REPLACE(ARRAY(

    SELECT AS STRUCT *

    FROM(

      SELECT AS STRUCT * REPLACE(map.category AS productCategoryAttribute) FROM UNNEST(product))) AS product) FROM UNNEST(hits)

)

FROM

  `attribution.attribute_category_map`

AS map

WHERE

  (

    SELECT REPLACE(LOWER(prod.productCategory), 'amp;', '') FROM UNNEST(target.hits) AS h,

    UNNEST(h.product) AS prod LIMIT 1) = map.raw_name

attribute_category_map是一个具有两列的表,我在其中查找第1列中的对应值,并用第2列中的值替换目标表中的数据。我实现的最佳结果-
用相同的值更新了一行中的所有嵌套字段,这仅适用于第一个嵌套字段,而不是使用特定值更新每个嵌套字段。

主表的简化架构:

[  
   {  
      "name":"sessionId",
      "type":"STRING",
      "mode":"NULLABLE"
   },
   {  
      "name":"hits",
      "type":"RECORD",
      "mode":"REPEATED",
      "fields":[  
         {  
            "name":"product",
            "type":"RECORD",
            "mode":"REPEATED",
            "fields":[  
               {  
                  "name":"productCategory",
                  "type":"STRING",
                  "mode":"NULLABLE"
               },
               {  
                  "name":"productCategoryAttribute",
                  "type":"STRING",
                  "mode":"NULLABLE"
               }
            ]
         }
      ]
   }
]

会话行中通常有多个匹配项,一个匹配项中通常包含多个产品。看起来像那些值(如果您嵌套):

-----------------------------------------------------------------------------
sessionId | hits.product.productCategory| hit.product.productCategoryAttribute
-----------------------------------------------------------------------------
1         | automotive chemicals        | null
1         | automotive tools            | null
1         | null                        | null
2         | null                        | null
2         | automotive chemicals        | null
2         | null                        | null
3         | null                        | null
3         | bed accessories             | null
4         | null                        | null
4         | null                        | null
4         | automotive chemicals        | null
4         | null                        | null
-----------------------------------------------------------------------------

映射表的架构:

[  
   {  
      "name":"raw_name",
      "type":"STRING",
      "mode":"NULLABLE"
   },
   {  
      "name":"category",
      "type":"STRING",
      "mode":"NULLABLE"
   }
]

具有这样的值:

---------------------------------------------------
raw_name              |category                   |
---------------------------------------------------
automotive chemicals  |d1y2 - automotive chemicals|
automotive paint      |dijf1 - automotive paint   |
automotive tools      |efw1 - automotive tools    |
baby & infant toys    |wwfw - baby & infant toys  |
batteries & power     |fdsv- batteries & power    |
bed accessories       |0k77 - bed accessories     |
bike racks            |12df - bike racks          |
--------------------------------------------------

结果是我想要什么:

-----------------------------------------------------------------------------
    sessionId | hits.product.productCategory| hit.product.productCategoryAttribute
-----------------------------------------------------------------------------
    1         | automotive chemicals        | d1y2 - automotive chemicals
    1         | automotive tools            | efw1 - automotive tools
    1         | null                        | null
    2         | null                        | null
    2         | automotive chemicals        | d1y2 - automotive chemicals
    2         | null                        | null
    3         | null                        | null
    3         | bed accessories             | 0k77 - bed accessories
    4         | null                        | null
    4         | null                        | null
    4         | automotive chemicals        | d1y2 - automotive chemicals
    4         | null                        | null
    -----------------------------------------------------------------------------

我需要从主表中获取值productCategory,在map表中的raw_name列中查找它,从colum类别中获取值,并将其放入主表的productCategoryAttribute列中。主要问题是目标字段是双重嵌套的,我不知道如何直接加入它们。


阅读 189

收藏
2021-05-05

共1个答案

小编典典

下面经过测试!
保留整个表的架构/数据不变,并且仅根据各自的映射更新productCategoryAttribute的值

#standardSQL
UPDATE `project.dataset.your_table` t
SET hits = 
  ARRAY(
    SELECT AS STRUCT * REPLACE(
      ARRAY(
        SELECT AS STRUCT product.* REPLACE(
          CASE WHEN map.raw_name = product.productCategory THEN category 
            ELSE productCategoryAttribute END AS productCategoryAttribute)
        FROM UNNEST(product) product
        LEFT JOIN UNNEST(agg_map.map) map 
        ON map.raw_name = product.productCategory
      ) AS product)
    FROM UNNEST(hits) hit
  ) 
FROM (SELECT ARRAY_AGG(row) map FROM `project.dataset.map` row) agg_map 
WHERE TRUE

注意:以上解决方案假定映射表不是那么大,因为它依赖于将整个映射表聚合到一个数组中

2021-05-05