我有一张交易表:
Transactions ------------ id | account | type | date_time | amount ---------------------------------------------------- 1 | 001 | 'R' | '2012-01-01 10:01:00' | 1000 2 | 003 | 'R' | '2012-01-02 12:53:10' | 1500 3 | 003 | 'A' | '2012-01-03 13:10:01' | -1500 4 | 002 | 'R' | '2012-01-03 17:56:00' | 2000 5 | 001 | 'R' | '2012-01-04 12:30:01' | 1000 6 | 002 | 'A' | '2012-01-04 13:23:01' | -2000 7 | 003 | 'R' | '2012-01-04 15:13:10' | 3000 8 | 003 | 'R' | '2012-01-05 12:12:00' | 1250 9 | 003 | 'A' | '2012-01-06 17:24:01' | -1250
并且我希望选择所有特定类型(’R’)的商品,但不要立即(按照date_time字段的顺序)针对同一帐户提交另一种其他类型(’A’)交易的商品…
因此,在给出前面的示例的情况下,查询应引发以下行:
id | account |type | date | amount ---------------------------------------------------- 1 | 001 | 'R' | '2012-01-01 10:01:00' | 1000 5 | 001 | 'R' | '2012-01-04 12:30:01' | 1000 7 | 003 | 'R' | '2012-01-04 15:13:10' | 3000
(如您所见,第2行未显示,因为第3行“取消了” …第4行也被“第6行“取消了”;第7行出现了(即使帐户003属于已取消的第2行) ,这一次在第7行中不会被任何“ A”行取消;并且不会出现第8行(对于003帐户来说也是如此,因为现在该帐户被9取消了,也不会取消7,只是先前的一个一:8 …
我已经尝试过Joins,Where子句中的子查询,但是我真的不确定如何进行查询…
我尝试过的是:
尝试加入:
SELECT trans.type as type, trans.amount as amount, trans.date_time as dt, trans.account as acct, FROM Transactions trans INNER JOIN ( SELECT t.type AS type, t.acct AS acct, t.date_time AS date_time FROM Transactions t WHERE t.date_time > trans.date_time ORDER BY t.date_time DESC ) AS nextTrans ON nextTrans.acct = trans.acct WHERE trans.type IN ('R') AND nextTrans.type NOT IN ('A') ORDER BY DATE(trans.date_time) ASC
这引发了错误,因为我无法在MySQL中向JOIN引入外部值。
在以下位置尝试子查询:
SELECT trans.type as type, trans.amount as amount, trans.date_time as dt, trans.account as acct, FROM Transactions trans WHERE trans.type IN ('R') AND trans.datetime < ( SELECT t.date_time AS date_time FROM Transactions t WHERE t.account = trans.account ORDER BY t.date_time DESC ) AS nextTrans ON nextTrans.acct = trans.acct ORDER BY DATE(trans.date_time) ASC
这是错误的,我可以在MySQL的WHERE中引入外部值,但是我无法设法找到正确过滤所需方法的方法。
重要编辑:
我设法实现了一个解决方案,但现在需要进行认真的优化。这里是:
SELECT * FROM (SELECT t1.*, tFlagged.id AS cancId, tFlagged.type AS cancFlag FROM transactions t1 LEFT JOIN (SELECT t2.* FROM transactions t2 ORDER BY t2.date_time ASC ) tFlagged ON (t1.account=tFlagged.account AND t1.date_time < tFlagged.date_time) WHERE t1.type = 'R' GROUP BY t1.id) tCanc WHERE tCanc.cancFlag IS NULL OR tCanc.cancFlag <> 'A'
我自己加入了表格,只是考虑到相同的帐户和不错的date_time。Join按date_time排序。按ID分组,我设法仅获得加入的第一个结果,该结果恰好是同一帐户的下一个交易。
然后在外部选择中,我将那些具有“ A”的内容过滤掉,因为这意味着下一个交易实际上是对它的取消。换句话说,如果同一个帐户没有下一个交易,或者下一个交易是“ R”,则该交易不会被取消,并且必须在结果中显示…
我懂了:
+----+---------+------+---------------------+--------+--------+----------+ | id | account | type | date_time | amount | cancId | cancFlag | +----+---------+------+---------------------+--------+--------+----------+ | 1 | 001 | R | 2012-01-01 10:01:00 | 1000 | 5 | R | | 5 | 001 | R | 2012-01-04 12:30:01 | 1000 | NULL | NULL | | 7 | 003 | R | 2012-01-04 15:13:10 | 3000 | 8 | R | +----+---------+------+---------------------+--------+--------+----------+
它将每个交易与下一个交易及时关联到同一帐户,然后筛选出已取消的交易…成功!
正如我所说,现在的问题是优化。我的真实数据有很多行(因为期望通过时间来保存事务的表具有),并且对于现在约10,000行的表,此查询在1分44秒内得到了肯定的结果。我想连接就是问题……(对于那些在这里了解协议的人,我应该怎么做?在这里发起一个新问题,并将其发布为该问题的解决方案?或者只是在这里等待更多答案?)
这是基于嵌套子查询的解决方案。首先,我添加了几行来处理更多案例。例如,事务10不应被事务12取消,因为事务11介于两者之间。
> select * from transactions order by date_time; +----+---------+------+---------------------+--------+ | id | account | type | date_time | amount | +----+---------+------+---------------------+--------+ | 1 | 1 | R | 2012-01-01 10:01:00 | 1000 | | 2 | 3 | R | 2012-01-02 12:53:10 | 1500 | | 3 | 3 | A | 2012-01-03 13:10:01 | -1500 | | 4 | 2 | R | 2012-01-03 17:56:00 | 2000 | | 5 | 1 | R | 2012-01-04 12:30:01 | 1000 | | 6 | 2 | A | 2012-01-04 13:23:01 | -2000 | | 7 | 3 | R | 2012-01-04 15:13:10 | 3000 | | 8 | 3 | R | 2012-01-05 12:12:00 | 1250 | | 9 | 3 | A | 2012-01-06 17:24:01 | -1250 | | 10 | 3 | R | 2012-01-07 00:00:00 | 1250 | | 11 | 3 | R | 2012-01-07 05:00:00 | 4000 | | 12 | 3 | A | 2012-01-08 00:00:00 | -1250 | | 14 | 2 | R | 2012-01-09 00:00:00 | 2000 | | 13 | 3 | A | 2012-01-10 00:00:00 | -1500 | | 15 | 2 | A | 2012-01-11 04:00:00 | -2000 | | 16 | 2 | R | 2012-01-12 00:00:00 | 5000 | +----+---------+------+---------------------+--------+ 16 rows in set (0.00 sec)
首先,创建一个查询以针对每笔交易获取“同一帐户中最近一笔交易的日期”:
SELECT t2.*, MAX(t1.date_time) AS prev_date FROM transactions t1 JOIN transactions t2 ON (t1.account = t2.account AND t2.date_time > t1.date_time) GROUP BY t2.account,t2.date_time ORDER BY t2.date_time; +----+---------+------+---------------------+--------+---------------------+ | id | account | type | date_time | amount | prev_date | +----+---------+------+---------------------+--------+---------------------+ | 3 | 3 | A | 2012-01-03 13:10:01 | -1500 | 2012-01-02 12:53:10 | | 5 | 1 | R | 2012-01-04 12:30:01 | 1000 | 2012-01-01 10:01:00 | | 6 | 2 | A | 2012-01-04 13:23:01 | -2000 | 2012-01-03 17:56:00 | | 7 | 3 | R | 2012-01-04 15:13:10 | 3000 | 2012-01-03 13:10:01 | | 8 | 3 | R | 2012-01-05 12:12:00 | 1250 | 2012-01-04 15:13:10 | | 9 | 3 | A | 2012-01-06 17:24:01 | -1250 | 2012-01-05 12:12:00 | | 10 | 3 | R | 2012-01-07 00:00:00 | 1250 | 2012-01-06 17:24:01 | | 11 | 3 | R | 2012-01-07 05:00:00 | 4000 | 2012-01-07 00:00:00 | | 12 | 3 | A | 2012-01-08 00:00:00 | -1250 | 2012-01-07 05:00:00 | | 14 | 2 | R | 2012-01-09 00:00:00 | 2000 | 2012-01-04 13:23:01 | | 13 | 3 | A | 2012-01-10 00:00:00 | -1500 | 2012-01-08 00:00:00 | | 15 | 2 | A | 2012-01-11 04:00:00 | -2000 | 2012-01-09 00:00:00 | | 16 | 2 | R | 2012-01-12 00:00:00 | 5000 | 2012-01-11 04:00:00 | +----+---------+------+---------------------+--------+---------------------+ 13 rows in set (0.00 sec)
使用它作为子查询,以使每个事务及其前身在同一行上。使用一些过滤条件提取出我们感兴趣的交易-即,其前身是被完全取消的“ R”交易的“ A”交易-
SELECT t3.*,transactions.* FROM transactions JOIN (SELECT t2.*, MAX(t1.date_time) AS prev_date FROM transactions t1 JOIN transactions t2 ON (t1.account = t2.account AND t2.date_time > t1.date_time) GROUP BY t2.account,t2.date_time) t3 ON t3.account = transactions.account AND t3.prev_date = transactions.date_time AND t3.type='A' AND transactions.type='R' AND t3.amount + transactions.amount = 0 ORDER BY t3.date_time; +----+---------+------+---------------------+--------+---------------------+----+---------+------+---------------------+--------+ | id | account | type | date_time | amount | prev_date | id | account | type | date_time | amount | +----+---------+------+---------------------+--------+---------------------+----+---------+------+---------------------+--------+ | 3 | 3 | A | 2012-01-03 13:10:01 | -1500 | 2012-01-02 12:53:10 | 2 | 3 | R | 2012-01-02 12:53:10 | 1500 | | 6 | 2 | A | 2012-01-04 13:23:01 | -2000 | 2012-01-03 17:56:00 | 4 | 2 | R | 2012-01-03 17:56:00 | 2000 | | 9 | 3 | A | 2012-01-06 17:24:01 | -1250 | 2012-01-05 12:12:00 | 8 | 3 | R | 2012-01-05 12:12:00 | 1250 | | 15 | 2 | A | 2012-01-11 04:00:00 | -2000 | 2012-01-09 00:00:00 | 14 | 2 | R | 2012-01-09 00:00:00 | 2000 | +----+---------+------+---------------------+--------+---------------------+----+---------+------+---------------------+--------+ 4 rows in set (0.00 sec)
从上面的结果可以明显看出,我们已经快到了-我们已经确定了不需要的交易。使用LEFT JOIN我们可以将它们从整个交易集中过滤掉:
LEFT JOIN
SELECT transactions.* FROM transactions LEFT JOIN (SELECT transactions.id FROM transactions JOIN (SELECT t2.*, MAX(t1.date_time) AS prev_date FROM transactions t1 JOIN transactions t2 ON (t1.account = t2.account AND t2.date_time > t1.date_time) GROUP BY t2.account,t2.date_time) t3 ON t3.account = transactions.account AND t3.prev_date = transactions.date_time AND t3.type='A' AND transactions.type='R' AND t3.amount + transactions.amount = 0) t4 USING(id) WHERE t4.id IS NULL AND transactions.type = 'R' ORDER BY transactions.date_time; +----+---------+------+---------------------+--------+ | id | account | type | date_time | amount | +----+---------+------+---------------------+--------+ | 1 | 1 | R | 2012-01-01 10:01:00 | 1000 | | 5 | 1 | R | 2012-01-04 12:30:01 | 1000 | | 7 | 3 | R | 2012-01-04 15:13:10 | 3000 | | 10 | 3 | R | 2012-01-07 00:00:00 | 1250 | | 11 | 3 | R | 2012-01-07 05:00:00 | 4000 | | 16 | 2 | R | 2012-01-12 00:00:00 | 5000 | +----+---------+------+---------------------+--------+