我有一个列表列表,我试图根据它们的项目对它们进行分组或聚类。如果上一个组中没有元素,则嵌套列表将开始一个新组。
输入:
paths = [ ['D', 'B', 'A', 'H'], ['D', 'B', 'A', 'C'], ['H', 'A', 'C'], ['E', 'G', 'I'], ['F', 'G', 'I']]
我失败的代码:
paths = [ ['D', 'B', 'A', 'H'], ['D', 'B', 'A', 'C'], ['H', 'A', 'C'], ['E', 'G', 'I'], ['F', 'G', 'I'] ] groups = [] paths_clone = paths for path in paths: for node in path: for path_clone in paths_clone: if node in path_clone: if not path == path_clone: groups.append([path, path_clone]) else: groups.append(path) print groups
预期产量:
[ [ ['D', 'B', 'A', 'H'], ['D', 'B', 'A', 'C'], ['H', 'A', 'C'] ], [ ['E', 'G', 'I'], ['F', 'G', 'I'] ] ]
另一个例子:
paths = [['shifter', 'barrel', 'barrel shifter'], ['ARM', 'barrel', 'barrel shifter'], ['IP power', 'IP', 'power'], ['ARM', 'barrel', 'shifter']]
预期的输出组:
output = [ [['shifter', 'barrel', 'barrel shifter'], ['ARM', 'barrel', 'barrel shifter'], ['ARM', 'barrel', 'shifter']], [['IP power', 'IP', 'power']], ]
您是根据集合进行分组,因此请使用集合来检测新的分组:
def grouper(sequence): group, members = [], set() for item in sequence: if group and members.isdisjoint(item): # new group, yield and start new yield group group, members = [], set() group.append(item) members.update(item) yield group
这给出:
>>> for group in grouper(paths): ... print group ... [['D', 'B', 'A', 'H'], ['D', 'B', 'A', 'C'], ['H', 'A', 'C']] [['E', 'G', 'I'], ['F', 'G', 'I']]
或者您可以将其再次投射到列表中:
output = list(grouper(paths))
这假定组是连续的。如果您有不相交的组,则需要处理整个列表并遍历到目前为止为每个项目构造的所有组:
def grouper(sequence): result = [] # will hold (members, group) tuples for item in sequence: for members, group in result: if members.intersection(item): # overlap members.update(item) group.append(item) break else: # no group found, add new result.append((set(item), [item])) return [group for members, group in result]