admin

大查询SQL-计算字符串中的元素

sql

我有一个示例字符串数据,如下所示。

+----------------------------+
|          Items             |
|----------------------------|
| A >> B >> C                |
| A >> A >> B                |
| B >> B >> C >> C >> A      |
| B >> B >> B >> C >> A >> D |
+----------------------------+

以第一行中的“项目”为例,该“项目”包含三个元素,即A,B和C。我想按顺序计算每个元素的数量。此外,我将只关注3个元素,因此我将忽略最后一个“项”中的元素“
D”。

这就是我想要的:

+----------------------------+---------+----------+---------+----------+---------+----------+
|          Items             |  Item1  | I1_Count |  Item2  | I2_Count |  Item3  | I3_Count |
|----------------------------|---------|----------|---------|----------|---------|----------|
| A >> B >> C                |    A    |     1    |    B    |     1    |    C    |     1    | 
| A >> A >> B                |    A    |     2    |    B    |     1    |   NULL  |    NULL  |
| A >> B >> B >> C >> C      |    A    |     1    |    B    |     1    |    C    |     2    |
| B >> B >> C >> C >> A      |    B    |     2    |    C    |     2    |    A    |     1    |
| B >> B >> B >> C >> A >> D |    B    |     3    |    C    |     1    |    A    |     1    |
+----------------------------+---------+----------+---------+----------+---------+----------+

如何在SQL中实现它(我使用Google Bigquery)?

太感谢了!


阅读 180

收藏
2021-07-01

共1个答案

admin

您可以取消嵌套字符串,然后应用一些聚合逻辑:

with t as (
      select 'A >> B >> C' as items union all
      select 'A >> A >> B' as items union all
      select 'B >> B >> C >> C >> A' as items union all
      select 'B >> B >> B >> C >> A >> D' as items 
     )
select t.*,
       (select as struct max(case when n = 1 then item end) as item_1,
               max(case when n = 1 then cnt end) as item_1_cnt,
               max(case when n = 2 then item end) as item_2,
               max(case when n = 2 then cnt end) as item_2_cnt,
               max(case when n = 3 then item end) as item_3,
               max(case when n = 3 then cnt end) as item_3_cnt
        from (select item, dense_rank() over (order by min(n)) as n, count(*) as cnt
              from unnest(split(t.items, ' >> ')) item with offset n
              group by item
             ) x
       ).*
from t;

最里面的子查询使用计数器将字符串转换为行。然后将其汇总起来,以便在商品首次出现时对商品进行排序-以及商品的数量。

最后,这些汇总到所需的不同列中。

2021-07-01