根据其他列中最常见的值（不包括外围值）来过滤结果

小编典典

根据其他列中最常见的值（不包括外围值）来过滤结果

sql

考虑下表和伪查询：

Distinct Customers
WHERE
Most common PaymentMethod = 'CreditCard'
AND
Most common DeliveryService = '24hr'

Customer    TransID   PaymentMethod   DeliveryService
-----------------------------------------------------
Susan       1         CreditCard        24hr
Susan       2         CreditCard        24hr
Susan       3         Cash              24hr
John        4         CreditCard        48hr
John        5         CreditCard        48hr
Diane       6         CreditCard        24hr
Steve       7         Paypal            24hr
Steve       8         CreditCard        48hr
Steve       9         Paypal            24hr


Should return (2) records:

Customer
---------
Susan
Diane

另一种看待它的方式是，我想排除少数情况，即： 我不想返回“史蒂夫”，因为尽管他曾经使用过一次信用卡，但他通常不会这样做
，我只关心多数情况跨多个列的行为。

实际上，有更多列（10列）需要应用相同的原理，因此我正在寻求一种可扩展至少可搜索100ks条记录的技术。

阅读 158

2021-05-16

共1个答案

小编典典

一种方法使用窗口函数和聚合：

with cp as (
     select customerid, paymentmethod, count(*) as cnt,
            rank() over (partition by customerid order by count(*) desc) as seqnum
     from t
     group by customerid, paymentmethod
    ),
    cd as (
     select customerid, deliveryservice, count(*) as cnt
            rank() over (partition by customerid over by count(*) desc) as seqnum
     from t
     group by customerid, deliveryservice
    )
select cp.customerid
from cp join
     cd
     on cp.customerid = cd.customerid
where (cp.seqnum = 1 and cp.PaymentMethod = 'CreditCard') and
      (cd.seqnum = 1 and cd.DeliveryService = '24hr');

因为您需要沿着两个不同维度进行排名，所以我认为您需要两个子查询（或等效查询）。

2021-05-16