我有一个示例字符串数据,如下所示。
+----------------------------+ | Items | |----------------------------| | A >> B >> C | | A >> A >> B | | B >> B >> C >> C >> A | | B >> B >> B >> C >> A >> D | +----------------------------+
以第一行中的“项目”为例,该“项目”包含三个元素,即A,B和C。我想按顺序计算每个元素的数量。此外,我将只关注3个元素,因此我将忽略最后一个“项”中的元素“ D”。
这就是我想要的:
+----------------------------+---------+----------+---------+----------+---------+----------+ | Items | Item1 | I1_Count | Item2 | I2_Count | Item3 | I3_Count | |----------------------------|---------|----------|---------|----------|---------|----------| | A >> B >> C | A | 1 | B | 1 | C | 1 | | A >> A >> B | A | 2 | B | 1 | NULL | NULL | | A >> B >> B >> C >> C | A | 1 | B | 1 | C | 2 | | B >> B >> C >> C >> A | B | 2 | C | 2 | A | 1 | | B >> B >> B >> C >> A >> D | B | 3 | C | 1 | A | 1 | +----------------------------+---------+----------+---------+----------+---------+----------+
如何在SQL中实现它(我使用Google Bigquery)?
太感谢了!
您可以取消嵌套字符串,然后应用一些聚合逻辑:
with t as ( select 'A >> B >> C' as items union all select 'A >> A >> B' as items union all select 'B >> B >> C >> C >> A' as items union all select 'B >> B >> B >> C >> A >> D' as items ) select t.*, (select as struct max(case when n = 1 then item end) as item_1, max(case when n = 1 then cnt end) as item_1_cnt, max(case when n = 2 then item end) as item_2, max(case when n = 2 then cnt end) as item_2_cnt, max(case when n = 3 then item end) as item_3, max(case when n = 3 then cnt end) as item_3_cnt from (select item, dense_rank() over (order by min(n)) as n, count(*) as cnt from unnest(split(t.items, ' >> ')) item with offset n group by item ) x ).* from t;
最里面的子查询使用计数器将字符串转换为行。然后将其汇总起来,以便在商品首次出现时对商品进行排序-以及商品的数量。
最后,这些汇总到所需的不同列中。