多亏了user3125280,DW和Evgeny Kluev,问题才得以更新。
我有一个网页列表,我必须经常下载它们,每个网页都有不同的下载频率。基于此频率,我们将网页分为5组:
Items in group 1 are downloaded once per 1 hour items in group 2 once per 2 hours items in group 3 once per 4 hours items in group 4 once per 12 hours items in group 5 once per 24 hours
这意味着,我们必须在1小时内下载所有第1组网页,在2小时内下载所有第2组,依此类推。
我正在尝试制定一种算法。作为输入,我有:
a)DATA_ARR=一个有5个数字的数组。每个数字代表该组中的项目数。
DATA_ARR
b)TIME_ARR=一个数组,带有5个数字(1、2、4、12、24),表示将多久下载一次该项。
TIME_ARR
b)X=每小时要下载的网页总数。这是使用items_in_group / download_frequently计算并向上舍入的。If we have 15 items in group 5, and 3 items in group 4, this will be 15/24 + 3/12 = 0.875 and rounded is 1.
X
If we have 15 items in group 5, and 3 items in group 4, this will be 15/24 + 3/12 = 0.875 and rounded is 1.
我的程序必须每小时在最大X站点上下载一次。我希望算法输出类似:
Hour 1: A1 B0 C4 D5 Hour 2: A2 B1 C2 D2 ...
A1 =第一组的第二项 C0 =第三组的第一项
我的算法必须尽可能高效。这表示:
a)该模式必须 扩展 到至少200多个小时 二) 没有必要创建一个可重复的模式 c)的空间 需要 的时候能够以使用绝对最小带宽 d) 从来没有下载的项目往往比更新频率, 没有例外
例:
group 1: 0 items | once per 1 hour group 2: 3 items | once per 2 hours group 3: 4 items | once per 4 hours group 4: 0 items | once per 12 hours group 5: 0 items | once per 24 hours
我们计算每小时可带走的物品数量: 3/2+4/4 = 2.5. We round this upwards and it's 3.
3/2+4/4 = 2.5. We round this upwards and it's 3.
使用铅笔和纸,我们可以找到以下解决方案:
Hour 1: B0 C0 B1 Hour 2: B2 C1 c2 Hour 3: B0 C3 B1 Hour 4: B2 Hour 5: B0 C0 B1 Hour 6: B2 C1 c2 Hour 7: B0 C3 B1 Hour 8: B2 Hour 9: B0 C0 B1 Hour 10: B2 C1 c2 Hour 11: B0 C3 B1 Hour 12: B2 Hour 13: B0 C0 B1 Hour 14: B2 C1 c2 and continue the above.
我们每隔4小时服用C0,C1 C2和C3。我们还采取B0,B1并且B2每隔2小时。
C0
C1
C2
C3
B0
B1
B2
问题:请向我解释一下,如何设计一种能够使用绝对最小下载次数来下载项目的算法? 蛮力 不是 解决方案,算法必须在CPU方面高效,因为元素数量可能很大。
您可以阅读此处发布的答案:https : //cs.stackexchange.com/a/19422/12497以及user3125280在下面发布的答案。
您的问题是典型的计划问题。这些问题在计算机科学中得到了很好的研究,因此有大量文献可供参考。
该代码有点像Deficit round robin,但有一些简化。首先,我们通过添加data_to_process变量来自己填充队列。其次,队列只是循环访问值列表。
data_to_process
一个区别是,除数学误差外,该解决方案将获得所需的最佳值。
粗略的草图:尚未编译(c ++ 11) 基于unix的规范代码
#include <iostream> #include <vector> #include <numeric> #include <unistd.h> //#include <cmath> //for ceil #define TIME_SCALE ((double)60.0) //1 for realtime speed //Assuming you are not refreshing ints in the real case template<typename T> struct queue { const std::vector<T> data; //this will be filled with numbers int position; double refresh_rate; //must be refreshed ever ~ hours double data_rate; //this many refreshes per hour double credit; //amount of refreshes owed queue(std::initializer_list<T> v, int r ) : data(v), position(0), refresh_rate(r), credit(0) { data_rate = data.size() / (double) refresh_rate; } int getNext() { return data[position++ % data.size()]; } }; double time_passed(){ static double total; //if(total < 20){ //stop early usleep(60000000 / TIME_SCALE); //sleep for a minute total += 1.0 / 60.0; //add a minute std::cout << "Time: " << total << std::endl; return 1.0; //change to 1.0 / 60.0 for real time speed //} else return 0; } int main() { //keep a list of the queues std::vector<queue<int> > queues{ {{1, 2, 3}, 2}, {{1, 2, 3, 4}, 3}}; double total_data_rate = 0; for(auto q : queues) total_data_rate += q.data_rate; double data_to_process = 0; //how many refreshes we have to do int queue_number = 0; //which queue we are processing auto current_queue = &queues[0]; while(1) { data_to_process += time_passed() * total_data_rate; //data_to_process = ceil(data_to_process) //optional while(data_to_process >= 1){ //data_to_process >= 0 will make the the scheduler more //eager in the first time period (ie. everything will updated correctly //in the first period and and following periods if(current_queue->credit >= 1){ //don't change here though, since credit determines the weighting only, //not how many refreshes are made //refresh(current_queue.getNext(); std::cout << "From queue " << queue_number << " refreshed " << current_queue->getNext() << std::endl; current_queue->credit -= 1; data_to_process -= 1; } else { queue_number = (queue_number + 1) % queues.size(); current_queue = &queues[queue_number]; current_queue->credit += current_queue->data_rate; } } } return 0; }
该示例现在应使用–std = c ++ 11在gcc上进行编译,并为您提供所需的内容。
这是测试用例的输出:(用于非时间缩放的早期代码)
Time: 0 From queue 1 refreshed 1 From queue 0 refreshed 1 From queue 1 refreshed 2 Time: 1 From queue 0 refreshed 2 From queue 0 refreshed 3 From queue 1 refreshed 3 Time: 2 From queue 0 refreshed 1 From queue 1 refreshed 4 From queue 1 refreshed 1 Time: 3 From queue 0 refreshed 2 From queue 0 refreshed 3 From queue 1 refreshed 2 Time: 4 From queue 0 refreshed 1 From queue 1 refreshed 3 From queue 0 refreshed 2 Time: 5 From queue 0 refreshed 3 From queue 1 refreshed 4 From queue 1 refreshed 1
作为扩展,通过允许此调度程序仅完成前一个lcm(update_rate * lcm(… refresh rate …),ceil(update_rate))步骤来回答重复模式问题,然后重复该模式。
还:实际上,有时由于小时限制的要求,这将无法解决。当我使用您无法解决的示例并修改time_passed以返回0.1时,将通过每1.1小时更新一次(而不是在小时界限内)来解决时间表。