假设我有一个包含索引的列表[[start, end], [start1, end1], [start2, end2]]。
[[start, end], [start1, end1], [start2, end2]]
例如:
[[0, 133], [78, 100], [25, 30]]。
[[0, 133], [78, 100], [25, 30]]
我将如何检查列表之间的重叠并每次删除长度较长的列表?所以:
>>> list = [[0, 133], [78, 100], [25, 30]] >>> foo(list) [[78, 100], [25, 30]]
到目前为止,这是我尝试做的事情:
def cleanup_list(list): i = 0 c = 0 x = list[:] end = len(x) while i < end-1: for n in range(x[i][0], x[i][1]): if n in range(x[i+1][0], x[i+1][1]): list.remove(max(x[i], x[i+1])) i +=1 return list
但是除了令人费解之外,它还无法正常工作:
>>>cleanup_list([[0,100],[9,10],[12,90]]) [[0, 100], [12, 90]]
任何帮助,将不胜感激!
编辑:
其他示例是:
>>>a = [[0, 100], [4, 20], [30, 35], [30, 78]] >>>foo(a) [[4, 20], [30, 35]] >>>b = [[30, 70], [25, 40]] >>>foo(b) [[25, 40]]
我基本上是想删除所有与另一个列表重叠的最长列表。在这种情况下,我不必担心列表长度相等。
谢谢!!
为了从列表中删除最小数量的间隔,以使剩余的间隔不重叠,O(n*log n)存在算法:
O(n*log n)
def maximize_nonoverlapping_count(intervals): # sort by the end-point L = sorted(intervals, key=lambda (start, end): (end, (end - start)), reverse=True) # O(n*logn) iv = build_interval_tree(intervals) # O(n*log n) result = [] while L: # until there are intervals left to consider # pop the interval with the smallest end-point, keep it in the result result.append(L.pop()) # O(1) # remove intervals that overlap with the popped interval overlapping_intervals = iv.pop(result[-1]) # O(log n + m) remove(overlapping_intervals, from_=L) return result
它应产生以下结果:
f = maximize_nonoverlapping_count assert f([[0, 133], [78, 100], [25, 30]]) == [[25, 30], [78, 100]] assert f([[0,100],[9,10],[12,90]]) == [[9,10], [12, 90]] assert f([[0, 100], [4, 20], [30, 35], [30, 78]]) == [[4, 20], [30, 35]] assert f([[30, 70], [25, 40]]) == [[25, 40]]
它要求数据结构能够及时找到O(log n + m)与给定间隔重叠的所有间隔,例如IntervalTree。有一些可从Python使用的实现,例如,有关示例用法quicksect.py,请参见快速间隔交集方法。
O(log n + m)
IntervalTree
quicksect.py
这是上述算法的quicksect基于O(n**2)实现:
quicksect
O(n**2)
from quicksect import IntervalNode class Interval(object): def __init__(self, start, end): self.start = start self.end = end self.removed = False def maximize_nonoverlapping_count(intervals): intervals = [Interval(start, end) for start, end in intervals] # sort by the end-point intervals.sort(key=lambda x: (x.end, (x.end - x.start))) # O(n*log n) tree = build_interval_tree(intervals) # O(n*log n) result = [] for smallest in intervals: # O(n) (without the loop body) # pop the interval with the smallest end-point, keep it in the result if smallest.removed: continue # skip removed nodes smallest.removed = True result.append([smallest.start, smallest.end]) # O(1) # remove (mark) intervals that overlap with the popped interval tree.intersect(smallest.start, smallest.end, # O(log n + m) lambda x: setattr(x.other, 'removed', True)) return result def build_interval_tree(intervals): root = IntervalNode(intervals[0].start, intervals[0].end, other=intervals[0]) return reduce(lambda tree, x: tree.insert(x.start, x.end, other=x), intervals[1:], root)
注意:在最坏情况下的时间复杂度是O(n**2)针对此实现的,因为仅将间隔标记为已删除,例如,想象一下这样的输入intervals,len(result) == len(intervals) / 3并且存在len(intervals) / 2跨越整个范围的间隔,tree.intersect()则将被称为“ n/3时间”,而每次调用将x.other.removed = True至少执行n/2一次,即,n*n/6总共操作:
intervals
len(result) == len(intervals) / 3
len(intervals) / 2
tree.intersect()
n/3
x.other.removed = True
n/2
n*n/6
n = 6 intervals = [[0, 100], [0, 100], [0, 100], [0, 10], [10, 20], [15, 40]]) result = [[0, 10], [10, 20]]
这是一个banyan基于的O(n log n)实现:
banyan
O(n log n)
from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan def maximize_nonoverlapping_count(intervals): # sort by the end-point O(n log n) sorted_intervals = SortedSet(intervals, key=lambda (start, end): (end, (end - start))) # build "interval" tree O(n log n) tree = SortedSet(intervals, updator=OverlappingIntervalsUpdator) result = [] while sorted_intervals: # until there are intervals left to consider # pop the interval with the smallest end-point, keep it in the result result.append(sorted_intervals.pop()) # O(log n) # remove intervals that overlap with the popped interval overlapping_intervals = tree.overlap(result[-1]) # O(m log n) tree -= overlapping_intervals # O(m log n) sorted_intervals -= overlapping_intervals # O(m log n) return result
注:此实现认为[0, 10]和[10, 20]间隔是重叠的:
[0, 10]
[10, 20]
f = maximize_nonoverlapping_count assert f([[0, 100], [0, 10], [11, 20], [15, 40]]) == [[0, 10] ,[11, 20]] assert f([[0, 100], [0, 10], [10, 20], [15, 40]]) == [[0, 10] ,[15, 40]]
sorted_intervals并tree可以合并:
sorted_intervals
tree
from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan def maximize_nonoverlapping_count(intervals): # build "interval" tree sorted by the end-point O(n log n) tree = SortedSet(intervals, key=lambda (start, end): (end, (end - start)), updator=OverlappingIntervalsUpdator) result = [] while tree: # until there are intervals left to consider # pop the interval with the smallest end-point, keep it in the result result.append(tree.pop()) # O(log n) # remove intervals that overlap with the popped interval overlapping_intervals = tree.overlap(result[-1]) # O(m log n) tree -= overlapping_intervals # O(m log n) return result