根据每个子列表中的第三项删除列表中的重复项

小编典典

根据每个子列表中的第三项删除列表中的重复项

python

我有一个列表列表，看起来像：

c = [['470', '4189.0', 'asdfgw', 'fds'],
     ['470', '4189.0', 'qwer', 'fds'],
     ['470', '4189.0', 'qwer', 'dsfs fdv'] 
      ...]

c有大约30,000个内部清单。我想做的是根据每个内部列表的第4个项目消除重复项。因此，上面的列表列表如下所示：

c = [['470', '4189.0', 'asdfgw', 'fds'],['470', '4189.0', 'qwer', 'dsfs fdv'] ...]

这是我到目前为止的内容：

d = [] #list that will contain condensed c
d.append(c[0]) #append first element, so I can compare lists
for bact in c: #c is my list of lists with 30,000 interior list
    for items in d:
        if bact[3] != items[3]:
            d.append(bact)

我认为这应该可行，但它会不断运行。我让它运行30分钟，然后将其杀死。我认为程序不需要花那么长时间，所以我猜我的逻辑有问题。

我觉得创建一个全新的列表列表非常愚蠢。任何帮助将不胜感激，请随时随地学习。如果不正确，请更正我的词汇。

阅读 127

2020-12-20

共1个答案

小编典典

我会这样：

seen = set()
cond = [x for x in c if x[3] not in seen and not seen.add(x[3])]

说明：

seen是一个跟踪每个子列表中第四个元素的集合。
cond是精简清单。如果x[3]（其中x的子列表中的c）不在其中seen，x则将添加到中cond并将x[3]添加到中seen。

seen.add(x[3])将返回None，所以not seen.add(x[3])总是会True，但如果这部分将只进行评估x[3] not in seen是True因为Python使用短路评价。如果第二个条件得到评估，它将始终返回True并具有添加x[3]到的副作用seen。这是正在发生的事情的另一个示例（print返回None并具有打印某些内容的“副作用”）：

>>> False and not print('hi')
False
>>> True and not print('hi')
hi
True

2020-12-20