我有成千上万的1到100个数字的行,每行定义一组数字以及它们之间的关系。我需要获取相关数字集。
小例子:如果我有这7行数据
T1 T2 T3 T4 T5 T6 T1 T5 T4 T3 T4 T7
我需要一个不太慢的算法来知道这里的设置是:
T1 T2 T6 (because T1 is related with T2 in the first line and T1 related with T6 in the line 5) T3 T4 T5 T7 (because T5 is with T4 in line 6 and T3 is with T4 and T7 in line 7)
但是当您有非常大的集合时,很难在每个大集合中搜索T(x)并进行集合的并集…等等。
您是否有暗示以不太暴力的方式执行此操作?
我正在尝试在Python中执行此操作。
一旦建立了数据结构,您想对它执行什么查询?向我们展示您现有的代码。什么是T(x)?您说的是“数字组”,但样本数据显示了T1,T2等。请解释。
您是否已阅读以下内容:http : //en.wikipedia.org/wiki/Disjoint- set_data_structure
尝试查看以下Python实现:http : //code.activestate.com/recipes/215912-union-find-data- structure/
或者您可以自己抨击一些简单易懂的事情,例如
[更新:全新代码]
class DisjointSet(object): def __init__(self): self.leader = {} # maps a member to the group's leader self.group = {} # maps a group leader to the group (which is a set) def add(self, a, b): leadera = self.leader.get(a) leaderb = self.leader.get(b) if leadera is not None: if leaderb is not None: if leadera == leaderb: return # nothing to do groupa = self.group[leadera] groupb = self.group[leaderb] if len(groupa) < len(groupb): a, leadera, groupa, b, leaderb, groupb = b, leaderb, groupb, a, leadera, groupa groupa |= groupb del self.group[leaderb] for k in groupb: self.leader[k] = leadera else: self.group[leadera].add(b) self.leader[b] = leadera else: if leaderb is not None: self.group[leaderb].add(a) self.leader[a] = leaderb else: self.leader[a] = self.leader[b] = a self.group[a] = set([a, b]) data = """T1 T2 T3 T4 T5 T1 T3 T6 T7 T8 T3 T7 T9 TA T1 T9""" # data is chosen to demonstrate each of 5 paths in the code from pprint import pprint as pp ds = DisjointSet() for line in data.splitlines(): x, y = line.split() ds.add(x, y) print print x, y pp(ds.leader) pp(ds.group)
这是最后一步的输出:
T1 T9 {'T1': 'T1', 'T2': 'T1', 'T3': 'T3', 'T4': 'T3', 'T5': 'T1', 'T6': 'T3', 'T7': 'T3', 'T8': 'T3', 'T9': 'T1', 'TA': 'T1'} {'T1': set(['T1', 'T2', 'T5', 'T9', 'TA']), 'T3': set(['T3', 'T4', 'T6', 'T7', 'T8'])}