我有独特的id和email领域。电子邮件被重复。我只想保留所有重复项的一个电子邮件地址,但保留最新的id(最后插入的记录)。
id
email
我该如何实现?
假设您的表test包含以下数据:
test
select id, email from test; ID EMAIL ---------------------- -------------------- 1 aaa 2 bbb 3 ccc 4 bbb 5 ddd 6 eee 7 aaa 8 aaa 9 eee
因此,我们需要找到所有重复的电子邮件并删除所有重复的电子邮件,但是最新的ID。 在这种情况下,aaa,bbb和eee重复,所以我们要删除ID为1,7,2和6。
aaa
bbb
eee
为此,首先我们需要查找所有重复的电子邮件:
select email from test group by email having count(*) > 1; EMAIL -------------------- aaa bbb eee
然后,从该数据集中,我们需要为这些重复的电子邮件中的每一封找到最新的ID:
select max(id) as lastId, email from test where email in ( select email from test group by email having count(*) > 1 ) group by email; LASTID EMAIL ---------------------- -------------------- 8 aaa 4 bbb 9 eee
最后,我们现在可以删除所有ID小于LASTID的电子邮件。因此解决方案是:
delete test from test inner join ( select max(id) as lastId, email from test where email in ( select email from test group by email having count(*) > 1 ) group by email ) duplic on duplic.email = test.email where test.id < duplic.lastId;
我现在没有在这台机器上安装mySql,但是应该可以
上面的删除有效,但是我发现了一个更优化的版本:
delete test from test inner join ( select max(id) as lastId, email from test group by email having count(*) > 1) duplic on duplic.email = test.email where test.id < duplic.lastId;
您会看到它删除了最旧的重复项,即1、7、2、6:
select * from test; +----+-------+ | id | email | +----+-------+ | 3 | ccc | | 4 | bbb | | 5 | ddd | | 8 | aaa | | 9 | eee | +----+-------+
另一个版本是Rene Limon删除的内容
delete from test where id not in ( select max(id) from test group by email)