大查询SQL-计算字符串中的元素

admin

大查询SQL-计算字符串中的元素

sql

我有一个示例字符串数据，如下所示。

+----------------------------+
|          Items             |
|----------------------------|
| A >> B >> C                |
| A >> A >> B                |
| B >> B >> C >> C >> A      |
| B >> B >> B >> C >> A >> D |
+----------------------------+

以第一行中的“项目”为例，该“项目”包含三个元素，即A，B和C。我想按顺序计算每个元素的数量。此外，我将只关注3个元素，因此我将忽略最后一个“项”中的元素“
D”。

这就是我想要的：

+----------------------------+---------+----------+---------+----------+---------+----------+
|          Items             |  Item1  | I1_Count |  Item2  | I2_Count |  Item3  | I3_Count |
|----------------------------|---------|----------|---------|----------|---------|----------|
| A >> B >> C                |    A    |     1    |    B    |     1    |    C    |     1    | 
| A >> A >> B                |    A    |     2    |    B    |     1    |   NULL  |    NULL  |
| A >> B >> B >> C >> C      |    A    |     1    |    B    |     1    |    C    |     2    |
| B >> B >> C >> C >> A      |    B    |     2    |    C    |     2    |    A    |     1    |
| B >> B >> B >> C >> A >> D |    B    |     3    |    C    |     1    |    A    |     1    |
+----------------------------+---------+----------+---------+----------+---------+----------+

如何在SQL中实现它（我使用Google Bigquery）？

太感谢了！

阅读 180

2021-07-01

共1个答案

admin

您可以取消嵌套字符串，然后应用一些聚合逻辑：

with t as (
      select 'A >> B >> C' as items union all
      select 'A >> A >> B' as items union all
      select 'B >> B >> C >> C >> A' as items union all
      select 'B >> B >> B >> C >> A >> D' as items 
     )
select t.*,
       (select as struct max(case when n = 1 then item end) as item_1,
               max(case when n = 1 then cnt end) as item_1_cnt,
               max(case when n = 2 then item end) as item_2,
               max(case when n = 2 then cnt end) as item_2_cnt,
               max(case when n = 3 then item end) as item_3,
               max(case when n = 3 then cnt end) as item_3_cnt
        from (select item, dense_rank() over (order by min(n)) as n, count(*) as cnt
              from unnest(split(t.items, ' >> ')) item with offset n
              group by item
             ) x
       ).*
from t;

最里面的子查询使用计数器将字符串转换为行。然后将其汇总起来，以便在商品首次出现时对商品进行排序-以及商品的数量。

最后，这些汇总到所需的不同列中。

2021-07-01