在各种情况下,我观察到C ++中的链表迭代始终比Go中的慢10-15%。我在解决堆栈溢出这个神秘的第一次尝试是在这里。我编写的示例有问题,因为:
1)由于堆分配,内存访问是不可预测的,并且
2)因为没有实际的工作要做,所以一些人的编译器正在优化主循环。
为了解决这些问题,我有一个新程序,其中包含C 和Go的实现。C 版本花费1.75秒,而Go版本花费1.48秒。这次,我在计时开始之前进行了一次大堆分配,并使用它来操作对象池,并从中释放并获取链表的节点。这样,两种实现之间的内存访问应该完全相似。
希望这可以使奥秘重现!
C ++:
#include <iostream> #include <sstream> #include <fstream> #include <string> #include <vector> #include <boost/timer.hpp> using namespace std; struct Node { Node *next; // 8 bytes int age; // 4 bytes }; // Object pool, where every free slot points to the previous free slot template<typename T, int n> struct ObjPool { typedef T* pointer; typedef pointer* metapointer; ObjPool() : _top(NULL), _size(0) { pointer chunks = new T[n]; for (int i=0; i < n; i++) { release(&chunks[i]); } } // Giver an available pointer to the object pool void release(pointer ptr) { // Store the current pointer at the given address *(reinterpret_cast<metapointer>(ptr)) = _top; // Advance the pointer _top = ptr; // Increment the size ++_size; } // Pop an available pointer off the object pool for program use pointer acquire(void) { if(_size == 0){throw std::out_of_range("");} // Pop the top of the stack pointer retval = _top; // Step back to the previous address _top = *(reinterpret_cast<metapointer>(_top)); // Decrement the size --_size; // Return the next free address return retval; } unsigned int size(void) const {return _size;} protected: pointer _top; // Number of free slots available unsigned int _size; }; Node *nodes = nullptr; ObjPool<Node, 1000> p; void processAge(int age) { // If the object pool is full, pop off the head of the linked list and release // it from the pool if (p.size() == 0) { Node *head = nodes; nodes = nodes->next; p.release(head); } // Insert the new Node with given age in global linked list. The linked list is sorted by age, so this requires iterating through the nodes. Node *node = nodes; Node *prev = nullptr; while (true) { if (node == nullptr || age < node->age) { Node *newNode = p.acquire(); newNode->age = age; newNode->next = node; if (prev == nullptr) { nodes = newNode; } else { prev->next = newNode; } return; } prev = node; node = node->next; } } int main() { Node x = {}; std::cout << "Size of struct: " << sizeof(x) << "\n"; // 16 bytes boost::timer t; for (int i=0; i<1000000; i++) { processAge(i); } std::cout << t.elapsed() << "\n"; }
走:
package main import ( "time" "fmt" "unsafe" ) type Node struct { next *Node // 8 bytes age int32 // 4 bytes } // Every free slot points to the previous free slot type NodePool struct { top *Node size int } func NewPool(n int) NodePool { p := NodePool{nil, 0} slots := make([]Node, n, n) for i := 0; i < n; i++ { p.Release(&slots[i]) } return p } func (p *NodePool) Release(l *Node) { // Store the current top at the given address *((**Node)(unsafe.Pointer(l))) = p.top p.top = l p.size++ } func (p *NodePool) Acquire() *Node { if p.size == 0 { fmt.Printf("Attempting to pop from empty pool!\n") } retval := p.top // Step back to the previous address in stack of addresses p.top = *((**Node)(unsafe.Pointer(p.top))) p.size-- return retval } func processAge(age int32) { // If the object pool is full, pop off the head of the linked list and release // it from the pool if p.size == 0 { head := nodes nodes = nodes.next p.Release(head) } // Insert the new Node with given age in global linked list. The linked list is sorted by age, so this requires iterating through the nodes. node := nodes var prev *Node = nil for true { if node == nil || age < node.age { newNode := p.Acquire() newNode.age = age newNode.next = node if prev == nil { nodes = newNode } else { prev.next = newNode } return } prev = node node = node.next } } // Linked list of nodes, in ascending order by age var nodes *Node = nil var p NodePool = NewPool(1000) func main() { x := Node{}; fmt.Printf("Size of struct: %d\n", unsafe.Sizeof(x)) // 16 bytes start := time.Now() for i := 0; i < 1000000; i++ { processAge(int32(i)) } fmt.Printf("Time elapsed: %s\n", time.Since(start)) }
输出:
clang++ -std=c++11 -stdlib=libc++ minimalPool.cpp -O3; ./a.out Size of struct: 16 1.7548 go run minimalPool.go Size of struct: 16 Time elapsed: 1.487930629s
这两个程序之间的最大区别是,Go代码会忽略错误(如果幸运的话,如果您清空池,则会忽略错误(如果发生错误,将会恐慌或出现段错误),而C ++代码则会通过异常传播错误。比较:
if p.size == 0 { fmt.Printf("Attempting to pop from empty pool!\n") }
与
if(_size == 0){throw std::out_of_range("");}
至少有三种方法1可以使比较公平:
panic
abort
因此,让我们全部完成并比较结果3:
所以:
为什么?该异常实际上不会在您的测试运行中发生,因此实际的错误处理代码绝不会以任何一种语言运行。但是clang无法证明它不会发生。而且,由于您永远不会catch在任何地方出现异常,这意味着它必须针对整个堆栈中的每个非遗漏帧发出异常处理程序和堆栈展开器。因此,它在每个函数调用和返回上都要做更多的工作- 不需要 做 更多的工作,但是随后您的函数做的实际工作却很少,以至于不必要的额外工作加起来了。
clang
catch
1.您还可以更改C ++版本以执行C风格的错误处理,或使用Option类型,以及可能的其他可能性。
2.当然,这需要进行很多更改:您需要导入errors,将的返回类型更改Acquire为(*Node, error),将的返回类型更改processAge为error,更改您的所有return语句并至少添加两个if err != nil { … }检查。但这对Go来说应该是一件好事,对吧?
errors
Acquire
(*Node, error)
processAge
error
return
if err != nil { … }
3.当我在这,我换成你的遗产boost::timer有boost::auto_cpu_timer,所以我们现在看到的挂钟时间(如围棋)以及CPU时间。
boost::timer
boost::auto_cpu_timer
4.我不会尝试解释原因,因为我不理解。快速浏览一下组装,显然已经对某些检查进行了优化,但是我不明白为什么如果没有,就无法优化相同的检查panic。