在元组列表Python列表中查找重复项

小编典典

在元组列表Python列表中查找重复项

python

我想从下面的给定列表中找到匹配的项目。我的列表可能很大。

元组“ N1_10”中的第一个项目被复制并与另一个数组中的另一个项目匹配

ListA('N1_10', 'N2_28')
中第二个数组中的元组ListA中第二个数组中的元组('N1_10', 'N3_98')

ListA  = [[('N1_10', 'N2_28'), ('N1_35', 'N2_44')],
          [('N1_22', 'N3_72'), ('N1_10', 'N3_98')],
          [('N2_33', 'N3_28'), ('N2_55', 'N3_62'), ('N2_61', 'N3_37')]]

我想要的输出是

输出-> [('N1_10','N2_28','N3_98') , ....，其余任何与键之一匹配的都将进入同一元组]

如果你们认为，更改ListA的数据结构是更好的选择，请随时提出建议！感谢您的帮助！

简化版本

列表A = [[（（ a，x ），（b，k），（c，l），（ d，m ）]，[（ e，d ），（ a，p ），（g，s）]，
[…]，[…] ....]

wantedOutput-> [（（ a，x，p ），（b，k），（c，l），（ d，m，e ），（g，s）..]]

阅读 225

2021-01-20

共1个答案

小编典典

更新：重新阅读您的问题后，您似乎正在尝试创建对等类，而不是收集键的值。如果

[[(1, 2), (3, 4), (2, 3)]]

应该成为

[(1, 2, 3, 4)]

，那么您将需要将输入解释为图形并应用连接的组件算法。您可以将数据结构转换为邻接列表表示形式，并通过广度优先或深度优先搜索遍历它，或者遍历列表并构建不相交的集合。无论哪种情况，您的代码都会突然涉及很多与图形相关的复杂性，并且很难根据输入的顺序提供任何输出顺序保证。这是一种基于广度优先搜索的算法：

import collections

# build an adjacency list representation of your input
graph = collections.defaultdict(set)
for l in ListA:
    for first, second in l:
        graph[first].add(second)
        graph[second].add(first)

# breadth-first search the graph to produce the output
output = []
marked = set() # a set of all nodes whose connected component is known
for node in graph:
    if node not in marked:
        # this node is not in any previously seen connected component
        # run a breadth-first search to determine its connected component
        frontier = set([node])
        connected_component = []
        while frontier:
            marked |= frontier
            connected_component.extend(frontier)

            # find all unmarked nodes directly connected to frontier nodes
            # they will form the new frontier
            new_frontier = set()
            for node in frontier:
                new_frontier |= graph[node] - marked
            frontier = new_frontier
        output.append(tuple(connected_component))

但是，不要只是在不理解的情况下复制它。了解它在做什么，或者编写自己的实现。您可能需要能够维持这一点。（我会使用伪代码，但是Python实际上已经和伪代码一样简单。）

如果我对您的问题的原始解释是正确的，并且您的输入是要汇总的键值对的集合，那么这是我的原始答案：

原始答案

import collections

clusterer = collections.defaultdict(list)

for l in ListA:
    for k, v in l:
        clusterer[k].append(v)

output = clusterer.values()

defaultdict(list)是一个dict，它会自动list为所有尚不存在的键创建一个作为值。循环遍历所有元组，收集与同一键匹配的所有值，然后从defaultdict创建一个（key，value_list）对的列表。

（此代码的输出并不完全符合您指定的格式，但我认为此格式更有用。如果要更改格式，那应该很简单。）

2021-01-20