考虑下表和伪查询:
Distinct Customers WHERE Most common PaymentMethod = 'CreditCard' AND Most common DeliveryService = '24hr' Customer TransID PaymentMethod DeliveryService ----------------------------------------------------- Susan 1 CreditCard 24hr Susan 2 CreditCard 24hr Susan 3 Cash 24hr John 4 CreditCard 48hr John 5 CreditCard 48hr Diane 6 CreditCard 24hr Steve 7 Paypal 24hr Steve 8 CreditCard 48hr Steve 9 Paypal 24hr Should return (2) records: Customer --------- Susan Diane
另一种看待它的方式是,我想排除少数情况,即: 我不想返回“史蒂夫”,因为尽管他曾经使用过一次信用卡,但他通常不会这样做 ,我只关心多数情况跨多个列的行为。
实际上,有更多列(10列)需要应用相同的原理,因此我正在寻求一种可扩展至少可搜索100ks条记录的技术。
一种方法使用窗口函数和聚合:
with cp as ( select customerid, paymentmethod, count(*) as cnt, rank() over (partition by customerid order by count(*) desc) as seqnum from t group by customerid, paymentmethod ), cd as ( select customerid, deliveryservice, count(*) as cnt rank() over (partition by customerid over by count(*) desc) as seqnum from t group by customerid, deliveryservice ) select cp.customerid from cp join cd on cp.customerid = cd.customerid where (cp.seqnum = 1 and cp.PaymentMethod = 'CreditCard') and (cd.seqnum = 1 and cd.DeliveryService = '24hr');
因为您需要沿着两个不同维度进行排名,所以我认为您需要两个子查询(或等效查询)。