我试图弄清楚为什么修改后的C程序比未修改的计数器部分运行得更快(我添加了很少的代码行来执行一些其他工作)。在这种情况下,我怀疑“ 缓存效果 ”是主要的解释(指令缓存)。因此,我到达了perf(https://perf.wiki.kernel.org/index.php/Main_Page)分析工具,但是不幸的是,我无法理解其有关缓存未命中的输出的含义。
perf
提供了有关缓存的几个事件:
cache-references [Hardware event] cache-misses [Hardware event] L1-dcache-loads [Hardware cache event] L1-dcache-load-misses [Hardware cache event] L1-dcache-stores [Hardware cache event] L1-dcache-store-misses [Hardware cache event] L1-dcache-prefetches [Hardware cache event] L1-dcache-prefetch-misses [Hardware cache event] L1-icache-loads [Hardware cache event] L1-icache-load-misses [Hardware cache event] L1-icache-prefetches [Hardware cache event] L1-icache-prefetch-misses [Hardware cache event] LLC-loads [Hardware cache event] LLC-load-misses [Hardware cache event] LLC-stores [Hardware cache event] LLC-store-misses [Hardware cache event] LLC-prefetches [Hardware cache event] LLC-prefetch-misses [Hardware cache event] dTLB-loads [Hardware cache event] dTLB-load-misses [Hardware cache event] dTLB-stores [Hardware cache event] dTLB-store-misses [Hardware cache event] dTLB-prefetches [Hardware cache event] dTLB-prefetch-misses [Hardware cache event] iTLB-loads [Hardware cache event] iTLB-load-misses [Hardware cache event] branch-loads [Hardware cache event] branch-load-misses [Hardware cache event] node-loads [Hardware cache event] node-load-misses [Hardware cache event] node-stores [Hardware cache event] node-store-misses [Hardware cache event] node-prefetches [Hardware cache event] node-prefetch-misses [Hardware cache event]
在哪里可以找到有关这些领域的解释?高速缓存未命中事件始终小于其他事件。此事件衡量什么?
在下面的示例中,如何解释ls的26,760 L1-icache-load-miss与5708的cache-miss?
perf stat -e L1-icache-load-misses ls caches caches~ out Performance counter stats for 'ls': 26,760 L1-icache-load-misses 0.002816690 seconds time elapsed perf stat -e cache-misses ls caches caches~ out Performance counter stats for 'ls': 5,708 cache-misses 0.002822122 seconds time elapsed
您似乎以为该cache-misses事件是所有其他类型的缓存未命中之L1-dcache-load-misses和(等等)。这实际上是不正确的。
cache-misses
L1-dcache-load-misses
该cache-misses事件表示任何高速缓存无法提供的内存访问次数。
我承认perf的文档资料不是最好的。
但是,通过阅读perf_event_open()函数的文档(假设您已经非常了解CPU和性能监视单元的工作原理,这显然不是计算机体系结构课程),您可以学到很多东西。
http://web.eece.maine.edu/~vweaver/projects/perf_events/perf_event_open.html
例如,通过阅读它,您可以看到性能cache-misses列表显示的事件对应于PERF_COUNT_HW_CACHE_MISSES
PERF_COUNT_HW_CACHE_MISSES