查找字符串序列中的间隙

小编典典

查找字符串序列中的间隙

python

我有一个字符串序列-0000001, 0000002, 0000003....最多200万。它们不连续。意思是有差距。在0000003之后说下一个字符串可能是0000006。我需要找出所有这些间隙。在上述情况下（0000004、0000005）。

到目前为止，这是我所做的-

gaps  = list()
total = len(curr_ids)

for i in range(total):
    tmp_id = '%s' %(str(i).zfill(7))
    if tmp_id in curr_ids:
        continue
    else:
        gaps.append(tmp_id)
return gaps

但是正如您可能已经猜到的那样，自从我使用以来，这很慢list。如果我使用dict来预填充curr_ids，它将更快。但是填充哈希表的复杂性是什么？最快的方法是什么。

阅读 201

2021-01-20

共1个答案

小编典典

您可以对ID列表进行排序，然后仅执行一次：

def find_gaps(ids):
    """Generate the gaps in the list of ids."""
    j = 1
    for id_i in sorted(ids):
        while True:
            id_j = '%07d' % j
            j += 1
            if id_j >= id_i:
                break
            yield id_j

>>> list(find_gaps(["0000001", "0000003", "0000006"]))
['0000002', '0000004', '0000005']

如果输入列表已经按顺序排列，则可以避免sorted（尽管危害不大：如果列表已经排序，Python的自适应mergesort为O（
n ））。

2021-01-20