目标
我已经从hotmail下载了CSV文件,但其中有很多重复项。这些重复项是完整的副本,我不知道为什么我的手机会创建它们。
我想摆脱重复。
方法
编写python脚本以删除重复项。
技术指标
Windows XP SP 3 Python 2.7 带有400个联系人的CSV文件
更新:2016
如果您乐于使用有用的more_itertools外部库:
more_itertools
from more_itertools import unique_everseen with open('1.csv','r') as f, open('2.csv','w') as out_file: out_file.writelines(unique_everseen(f))
@IcyFlame解决方案的更有效版本
with open('1.csv','r') as in_file, open('2.csv','w') as out_file: seen = set() # set for fast O(1) amortized lookup for line in in_file: if line in seen: continue # skip duplicate seen.add(line) out_file.write(line)
要就地编辑同一文件,您可以使用此
import fileinput seen = set() # set for fast O(1) amortized lookup for line in fileinput.FileInput('1.csv', inplace=1): if line in seen: continue # skip duplicate seen.add(line) print line, # standard output is now redirected to the file