我试图找到该算法的时间复杂度。
迭代:算法从输入的位串产生给定汉明距离内的所有位串。它生成所有递增的序列0 <= a[0] < ... < a[dist-1] <strlen(num),并还原相应索引处的位。
0 <= a[0] < ... < a[dist-1] <strlen(num)
该向量a应该保留必须反转位的索引。因此,如果a包含当前索引i,我们将输出1而不是0,反之亦然。否则,我们按原样打印该位(请参见else- part),如下所示:
a
i
// e.g. hamming("0000", 2); void hamming(const char* num, size_t dist) { assert(dist > 0); vector<int> a(dist); size_t k = 0, n = strlen(num); a[k] = -1; while (true) if (++a[k] >= n) if (k == 0) return; else { --k; continue; } else if (k == dist - 1) { // this is an O(n) operation and will be called // (n choose dist) times, in total. print(num, a); } else { a[k+1] = a[k]; ++k; } }
该算法的时间复杂度是多少?
我的尝试说:
dist * n +(n选择t)* n + 2
但这似乎不正确,请考虑以下示例,所有示例都具有dist = 2:
len = 3, (3 choose 2) = 3 * O(n), 10 while iterations len = 4, (4 choose 2) = 6 * O(n), 15 while iterations len = 5, (5 choose 2) = 9 * O(n), 21 while iterations len = 6, (6 choose 2) = 15 * O(n), 28 while iterations
这是两个代表性的运行(打印将在循环开始时进行):
000, len = 3 k = 0, total_iter = 1 vector a = -1 0 k = 1, total_iter = 2 vector a = 0 0 Paid O(n) k = 1, total_iter = 3 vector a = 0 1 Paid O(n) k = 1, total_iter = 4 vector a = 0 2 k = 0, total_iter = 5 vector a = 0 3 k = 1, total_iter = 6 vector a = 1 1 Paid O(n) k = 1, total_iter = 7 vector a = 1 2 k = 0, total_iter = 8 vector a = 1 3 k = 1, total_iter = 9 vector a = 2 2 k = 0, total_iter = 10 vector a = 2 3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ gsamaras@pythagoras:~/Desktop/generate_bitStrings_HammDistanceT$ ./iter 0000, len = 4 k = 0, total_iter = 1 vector a = -1 0 k = 1, total_iter = 2 vector a = 0 0 Paid O(n) k = 1, total_iter = 3 vector a = 0 1 Paid O(n) k = 1, total_iter = 4 vector a = 0 2 Paid O(n) k = 1, total_iter = 5 vector a = 0 3 k = 0, total_iter = 6 vector a = 0 4 k = 1, total_iter = 7 vector a = 1 1 Paid O(n) k = 1, total_iter = 8 vector a = 1 2 Paid O(n) k = 1, total_iter = 9 vector a = 1 3 k = 0, total_iter = 10 vector a = 1 4 k = 1, total_iter = 11 vector a = 2 2 Paid O(n) k = 1, total_iter = 12 vector a = 2 3 k = 0, total_iter = 13 vector a = 2 4 k = 1, total_iter = 14 vector a = 3 3 k = 0, total_iter = 15 vector a = 3 4
while循环有点巧妙和微妙,并且可以说它正在做两种不同的事情(如果算上的初始化,甚至是三件事a)。这就是使您的复杂性计算具有挑战性的原因,并且效率也比以前低。
在摘要中,要从当前索引中递增地计算下一组索引,其想法是找到最后一个索引,该索引i小于n-dist+i,将其递增,然后将以下索引设置为a[i]+1,a[i]+2依此类推。
n-dist+i
a[i]+1
a[i]+2
例如,如果dist = 5,n = 11并且您的索引是:
0, 3, 5, 9, 10
然后5最后一个值小于n-dist+i(因为n-dist是6,并且10 = 6 + 4,9 = 6 + 3,但是5 <6 + 2)。
5
n-dist
因此,我们增加5,并设置后续整数以获取索引集:
0, 3, 6, 7, 8
现在假设您的代码如何运行,假设 k=4
k=4
a[k] + 1
k
++a[k]
a[k+1]
print
这段代码是正确的,但是效率不高,因为k它在搜索可以递增而不会引起较高索引溢出的最高索引时,会前后移动。实际上,如果最高索引是j从末尾开始的,则代码使用while循环的非线性数字迭代。如果跟踪n==dist不同值的while循环发生了多少次迭代,则可以轻松地自己演示一下n。仅存在一行输出,但是您会看到迭代次数增加了O(2^ n)(实际上,您会看到2 ^(n + 1)-2个迭代)。
j
n==dist
n
这种划分会使您的代码不必要地效率低下,并且也难以分析。
相反,您可以以更直接的方式编写代码:
void hamming2(const char* num, size_t dist) { int a[dist]; for (int i = 0; i < dist; i++) { a[i] = i; } size_t n = strlen(num); while (true) { print(num, a); int i; for (i = dist - 1; i >= 0; i--) { if (a[i] < n - dist + i) break; } if (i < 0) return; a[i]++; for (int j = i+1; j<dist; j++) a[j] = a[i] + j - i; } }
现在,每次通过while循环时,都会产生一组新的索引。每次迭代的确切成本并不简单,但是由于print是O(n),而while循环中的其余代码处于最差的O(dist),因此总成本为O(N_INCR_SEQ(n,dist)* n),其中N_INCR_SEQ(n,dist)是自然数<长度distn的自然数的递增序列数。注释中的某人提供了为此提供公式的链接。