小编典典

根据其他列中最常见的值(不包括外围值)来过滤结果

sql

考虑下表和伪查询:

Distinct Customers
WHERE
Most common PaymentMethod = 'CreditCard'
AND
Most common DeliveryService = '24hr'

Customer    TransID   PaymentMethod   DeliveryService
-----------------------------------------------------
Susan       1         CreditCard        24hr
Susan       2         CreditCard        24hr
Susan       3         Cash              24hr
John        4         CreditCard        48hr
John        5         CreditCard        48hr
Diane       6         CreditCard        24hr
Steve       7         Paypal            24hr
Steve       8         CreditCard        48hr
Steve       9         Paypal            24hr


Should return (2) records:

Customer
---------
Susan
Diane

另一种看待它的方式是,我想排除少数情况,即: 我不想返回“史蒂夫”,因为尽管他曾经使用过一次信用卡,但他通常不会这样做
,我只关心多数情况跨多个列的行为。

实际上,有更多列(10列)需要应用相同的原理,因此我正在寻求一种可扩展至少可搜索100ks条记录的技术。


阅读 158

收藏
2021-05-16

共1个答案

小编典典

一种方法使用窗口函数和聚合:

with cp as (
     select customerid, paymentmethod, count(*) as cnt,
            rank() over (partition by customerid order by count(*) desc) as seqnum
     from t
     group by customerid, paymentmethod
    ),
    cd as (
     select customerid, deliveryservice, count(*) as cnt
            rank() over (partition by customerid over by count(*) desc) as seqnum
     from t
     group by customerid, deliveryservice
    )
select cp.customerid
from cp join
     cd
     on cp.customerid = cd.customerid
where (cp.seqnum = 1 and cp.PaymentMethod = 'CreditCard') and
      (cd.seqnum = 1 and cd.DeliveryService = '24hr');

因为您需要沿着两个不同维度进行排名,所以我认为您需要两个子查询(或等效查询)。

2021-05-16