这不是标准的分区问题,因为我需要维护列表中元素的顺序。
例如,如果我有一个列表
[1, 6, 2, 3, 4, 1, 7, 6, 4]
我想要两个块,然后拆分应该给
[[1, 6, 2, 3, 4, 1], [7, 6, 4]]
每边总共17个。对于三块,结果将是
[[1, 6, 2, 3], [4, 1, 7], [6, 4]]
总计为12、12和10。
编辑以获取其他说明
我目前将总和除以块数并将其用作目标,然后进行迭代直到接近该目标。问题在于某些数据集可能会使算法混乱,例如,尝试将以下内容分为3:-
[95, 15, 75, 25, 85, 5]
总和为300,目标为100。第一个块的总和为95,第二个块的总和为90,第三个块的总和为110,而5为“剩余”。将其追加到应有的位置将得到95、90、115,而更“合理”的解决方案将是110、100、90。
结束编辑
背景:
我有一个包含高度不同的文本(歌词)的列表,我想将文本分为任意数量的列。目前,我基于所有行的总高度计算目标高度,但是显然这是一个始终被低估的情况,这在某些情况下会导致次优解决方案(最后一列明显更高)。
这种方法定义了分区边界,该分区边界将数组划分为大致相等数量的元素,然后反复搜索更好的分区,直到找不到更多分区为止。它与大多数其他已发布的解决方案的不同之处在于,它通过尝试多个不同的分区来寻找最佳解决方案。其他解决方案尝试通过数组的一次通过来创建良好的分区,但是我想不出保证最佳的一次通过算法。
此处的代码是该算法的有效实现,但可能很难理解,因此在末尾包含了更具可读性的版本作为附录。
def partition_list(a, k): if k <= 1: return [a] if k >= len(a): return [[x] for x in a] partition_between = [(i+1)*len(a)/k for i in range(k-1)] average_height = float(sum(a))/k best_score = None best_partitions = None count = 0 while True: starts = [0]+partition_between ends = partition_between+[len(a)] partitions = [a[starts[i]:ends[i]] for i in range(k)] heights = map(sum, partitions) abs_height_diffs = map(lambda x: abs(average_height - x), heights) worst_partition_index = abs_height_diffs.index(max(abs_height_diffs)) worst_height_diff = average_height - heights[worst_partition_index] if best_score is None or abs(worst_height_diff) < best_score: best_score = abs(worst_height_diff) best_partitions = partitions no_improvements_count = 0 else: no_improvements_count += 1 if worst_height_diff == 0 or no_improvements_count > 5 or count > 100: return best_partitions count += 1 move = -1 if worst_height_diff < 0 else 1 bound_to_move = 0 if worst_partition_index == 0\ else k-2 if worst_partition_index == k-1\ else worst_partition_index-1 if (worst_height_diff < 0) ^ (heights[worst_partition_index-1] > heights[worst_partition_index+1])\ else worst_partition_index direction = -1 if bound_to_move < worst_partition_index else 1 partition_between[bound_to_move] += move * direction def print_best_partition(a, k): print 'Partitioning {0} into {1} partitions'.format(a, k) p = partition_list(a, k) print 'The best partitioning is {0}\n With heights {1}\n'.format(p, map(sum, p)) a = [1, 6, 2, 3, 4, 1, 7, 6, 4] print_best_partition(a, 1) print_best_partition(a, 2) print_best_partition(a, 3) print_best_partition(a, 4) b = [1, 10, 10, 1] print_best_partition(b, 2) import random c = [random.randint(0,20) for x in range(100)] print_best_partition(c, 10) d = [95, 15, 75, 25, 85, 5] print_best_partition(d, 3)
根据您要执行的操作,可能需要进行一些修改。例如,要确定是否已找到最佳分区,该算法将在分区之间没有高度差时停止,它找不到比连续5次以上或100次迭代后看到的最佳结果更好的东西总迭代作为万能的停止点。您可能需要调整这些常数或使用其他方案。如果您的身高构成了复杂的价值观,那么知道何时停止就可能陷入试图逃脱局部最大值之类的经典问题。
Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 1 partitions The best partitioning is [[1, 6, 2, 3, 4, 1, 7, 6, 4]] With heights [34] Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 2 partitions The best partitioning is [[1, 6, 2, 3, 4, 1], [7, 6, 4]] With heights [17, 17] Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 3 partitions The best partitioning is [[1, 6, 2, 3], [4, 1, 7], [6, 4]] With heights [12, 12, 10] Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 4 partitions The best partitioning is [[1, 6], [2, 3, 4], [1, 7], [6, 4]] With heights [7, 9, 8, 10] Partitioning [1, 10, 10, 1] into 2 partitions The best partitioning is [[1, 10], [10, 1]] With heights [11, 11] Partitioning [7, 17, 17, 1, 8, 8, 12, 0, 10, 20, 17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9, 12, 3, 18, 9, 6, 7, 19, 20, 17, 7, 4, 3, 16, 20, 6, 7, 12, 16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16, 14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5, 13, 16, 0, 16, 7, 3, 8, 1, 20, 16, 11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18, 20, 3, 10, 9, 13, 12, 15, 6, 14, 16, 6, 12, 9, 9, 16, 14, 19, 1] into 10 partitions The best partitioning is [[7, 17, 17, 1, 8, 8, 12, 0, 10, 20], [17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9], [12, 3, 18, 9, 6, 7, 19, 20], [17, 7, 4, 3, 16, 20, 6, 7, 12], [16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16], [14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5], [13, 16, 0, 16, 7, 3, 8, 1, 20, 16], [11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18], [20, 3, 10, 9, 13, 12, 15, 6, 14], [16, 6, 12, 9, 9, 16, 14, 19, 1]] With heights [100, 95, 94, 92, 90, 87, 100, 93, 102, 102] Partitioning [95, 15, 75, 25, 85, 5] into 3 partitions The best partitioning is [[95, 15], [75, 25], [85, 5]] With heights [110, 100, 90]
添加了此方法可以正确处理的新测试用例[95、15、75、25、85、5]。
该版本的算法更易于阅读和理解,但由于较少利用内置的Python功能,因此版本更长。但是,它似乎可以在相当甚至更快的时间内执行。
#partition list a into k partitions def partition_list(a, k): #check degenerate conditions if k <= 1: return [a] if k >= len(a): return [[x] for x in a] #create a list of indexes to partition between, using the index on the #left of the partition to indicate where to partition #to start, roughly partition the array into equal groups of len(a)/k (note #that the last group may be a different size) partition_between = [] for i in range(k-1): partition_between.append((i+1)*len(a)/k) #the ideal size for all partitions is the total height of the list divided #by the number of paritions average_height = float(sum(a))/k best_score = None best_partitions = None count = 0 no_improvements_count = 0 #loop over possible partitionings while True: #partition the list partitions = [] index = 0 for div in partition_between: #create partitions based on partition_between partitions.append(a[index:div]) index = div #append the last partition, which runs from the last partition divider #to the end of the list partitions.append(a[index:]) #evaluate the partitioning worst_height_diff = 0 worst_partition_index = -1 for p in partitions: #compare the partition height to the ideal partition height height_diff = average_height - sum(p) #if it's the worst partition we've seen, update the variables that #track that if abs(height_diff) > abs(worst_height_diff): worst_height_diff = height_diff worst_partition_index = partitions.index(p) #if the worst partition from this run is still better than anything #we saw in previous iterations, update our best-ever variables if best_score is None or abs(worst_height_diff) < best_score: best_score = abs(worst_height_diff) best_partitions = partitions no_improvements_count = 0 else: no_improvements_count += 1 #decide if we're done: if all our partition heights are ideal, or if #we haven't seen improvement in >5 iterations, or we've tried 100 #different partitionings #the criteria to exit are important for getting a good result with #complex data, and changing them is a good way to experiment with getting #improved results if worst_height_diff == 0 or no_improvements_count > 5 or count > 100: return best_partitions count += 1 #adjust the partitioning of the worst partition to move it closer to the #ideal size. the overall goal is to take the worst partition and adjust #its size to try and make its height closer to the ideal. generally, if #the worst partition is too big, we want to shrink the worst partition #by moving one of its ends into the smaller of the two neighboring #partitions. if the worst partition is too small, we want to grow the #partition by expanding the partition towards the larger of the two #neighboring partitions if worst_partition_index == 0: #the worst partition is the first one if worst_height_diff < 0: partition_between[0] -= 1 #partition too big, so make it smaller else: partition_between[0] += 1 #partition too small, so make it bigger elif worst_partition_index == len(partitions)-1: #the worst partition is the last one if worst_height_diff < 0: partition_between[-1] += 1 #partition too small, so make it bigger else: partition_between[-1] -= 1 #partition too big, so make it smaller else: #the worst partition is in the middle somewhere left_bound = worst_partition_index - 1 #the divider before the partition right_bound = worst_partition_index #the divider after the partition if worst_height_diff < 0: #partition too big, so make it smaller if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the right bigger partition_between[right_bound] -= 1 else: #the partition on the left is smaller than the one on the right, so make the one on the left bigger partition_between[left_bound] += 1 else: #partition too small, make it bigger if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the left smaller partition_between[left_bound] -= 1 else: #the partition on the left is smaller than the one on the right, so make the one on the right smaller partition_between[right_bound] += 1 def print_best_partition(a, k): #simple function to partition a list and print info print ' Partitioning {0} into {1} partitions'.format(a, k) p = partition_list(a, k) print ' The best partitioning is {0}\n With heights {1}\n'.format(p, map(sum, p)) #tests a = [1, 6, 2, 3, 4, 1, 7, 6, 4] print_best_partition(a, 1) print_best_partition(a, 2) print_best_partition(a, 3) print_best_partition(a, 4) print_best_partition(a, 5) b = [1, 10, 10, 1] print_best_partition(b, 2) import random c = [random.randint(0,20) for x in range(100)] print_best_partition(c, 10) d = [95, 15, 75, 25, 85, 5] print_best_partition(d, 3)