我正在处理一些需要高度优化的 Java 代码,因为它将在我的主程序逻辑中的许多点调用的热函数中运行。此代码的一部分涉及将double变量乘以10任意非负int exponents。获得相乘值的一种快速方法(编辑:但不是最快的,请参见下面的更新 2)是switch在exponent:
double
10
int
exponent
switch
double multiplyByPowerOfTen(final double d, final int exponent) { switch (exponent) { case 0: return d; case 1: return d*10; case 2: return d*100; // ... same pattern case 9: return d*1000000000; case 10: return d*10000000000L; // ... same pattern with long literals case 18: return d*1000000000000000000L; default: throw new ParseException("Unhandled power of ten " + power, 0); } }
上面注释的省略号表示case int常量继续递增 1,因此上面的代码片段中确实有 19 cases。由于我不确定在case语句10thru中是否真的需要 10 的所有幂18,我运行了一些微基准测试,比较了使用此switch语句完成 1000 万次操作的时间switch与仅使用cases 0thru9的时间(exponent限制为 9 或更少避免打破精简switch)。我得到了一个相当令人惊讶的结果(至少对我来说!),越长switch的case语句实际上运行得越快。
case
18
0
9
在云雀中,我尝试添加更多case的 s ,它只是返回了虚拟值,并发现我可以让开关运行得更快,大约 22-27 个声明case的 s (即使在代码运行时这些虚拟案例实际上从未被命中) )。(同样,通过将先前常数case增加 s 以连续方式添加 s 。)这些执行时间差异不是很显着:对于 和 之间的随机,虚拟填充语句在 1.49 秒内完成 1000 万次执行,而未填充语句为 1.54 秒版本,每次执行总共节省 5ns。所以,不是那种让人痴迷于填充的东西case``1``exponent``0``10``switch``switch从优化的角度来看,声明值得付出努力。但是我仍然觉得奇怪和违反直觉的是,当添加更多的 s 时,它的 执行switch速度并没有变慢(或者可能充其量保持恒定的 O(1) 时间) 。case
case``1``exponent``0``10``switch``switch
这些是我通过对随机生成的exponent值进行各种限制而获得的结果。我没有将结果一直计算到1极限exponent,但曲线的总体形状保持不变,在 12-17 的情况下有一个山脊,在 18-28 之间有一个谷。所有测试都在 JUnitBenchmarks 中运行,使用共享容器获取随机值,以确保相同的测试输入。我还按照从最长switch语句到最短语句的顺序运行了测试,反之亦然,以尝试消除与排序相关的测试问题的可能性。如果有人想尝试重现这些结果,我已将我的测试代码放在 github repo 上。
1
那么,这里发生了什么?我的架构或微基准构建的一些变幻莫测?还是 Java在to范围内switch的执行速度真的比up to快一点?18``28 case``11``17
18``28
case``11``17
github 测试仓库“切换实验”
更新: 我清理了基准测试库,并在 /results 中添加了一个文本文件,其中包含更广泛的可能exponent值的一些输出。我还在测试代码中添加了一个不抛出Exceptionfrom的选项default,但这似乎不会影响结果。
Exception
default
更新 2: 在 2009 年的 xkcd 论坛上发现了一些关于这个问题的很好的讨论:http ://forums.xkcd.com/viewtopic.php?f=11&t=33524 。OP 对 using 的讨论Array.binarySearch()让我想到了上面求幂模式的简单的基于数组的实现。不需要二进制搜索,因为我知道其中的条目是什么array。它的运行速度似乎比 using 快 3 倍switch,显然是以牺牲一些控制流为代价的switch。该代码也已添加到 github 存储库中。
Array.binarySearch()
array
正如另一个答案所指出的那样,因为案例值是连续的(而不是稀疏的),为您的各种测试生成的字节码使用了一个开关表(字节码指令tableswitch)。
tableswitch
然而,一旦 JIT 开始它的工作并将字节码编译成汇编,tableswitch指令并不总是会产生一个指针数组:有时开关表会转换成看起来像 a 的东西lookupswitch(类似于if/else if结构)。
lookupswitch
if
else if
反编译由 JIT(热点 JDK 1.7)生成的程序集表明,当有 17 个或更少的情况时,它使用连续的 if/else,当有超过 18 个时使用指针数组(更有效)。
使用 18 这个幻数的原因似乎归结为MinJumpTableSizeJVM 标志的默认值(代码中的第 352 行附近)。
MinJumpTableSize
我已经在热点编译器列表中提出了这个问题,它似乎是过去测试的遗留问题。请注意,在执行了更多基准测试后,JDK 8 中删除了此默认值。
最后,当方法变得太长时(在我的测试中 > 25 个案例),它不再使用默认的 JVM 设置进行内联 - 这是当时性能下降的最可能原因。
在 5 种情况下,反编译的代码如下所示(注意 cmp/je/jg/jmp 指令,if/goto 的程序集):
[Verified Entry Point] # {method} 'multiplyByPowerOfTen' '(DI)D' in 'javaapplication4/Test1' # parm0: xmm0:xmm0 = double # parm1: rdx = int # [sp+0x20] (sp of caller) 0x00000000024f0160: mov DWORD PTR [rsp-0x6000],eax ; {no_reloc} 0x00000000024f0167: push rbp 0x00000000024f0168: sub rsp,0x10 ;*synchronization entry ; - javaapplication4.Test1::multiplyByPowerOfTen@-1 (line 56) 0x00000000024f016c: cmp edx,0x3 0x00000000024f016f: je 0x00000000024f01c3 0x00000000024f0171: cmp edx,0x3 0x00000000024f0174: jg 0x00000000024f01a5 0x00000000024f0176: cmp edx,0x1 0x00000000024f0179: je 0x00000000024f019b 0x00000000024f017b: cmp edx,0x1 0x00000000024f017e: jg 0x00000000024f0191 0x00000000024f0180: test edx,edx 0x00000000024f0182: je 0x00000000024f01cb 0x00000000024f0184: mov ebp,edx 0x00000000024f0186: mov edx,0x17 0x00000000024f018b: call 0x00000000024c90a0 ; OopMap{off=48} ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@72 (line 83) ; {runtime_call} 0x00000000024f0190: int3 ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@72 (line 83) 0x00000000024f0191: mulsd xmm0,QWORD PTR [rip+0xffffffffffffffa7] # 0x00000000024f0140 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@52 (line 62) ; {section_word} 0x00000000024f0199: jmp 0x00000000024f01cb 0x00000000024f019b: mulsd xmm0,QWORD PTR [rip+0xffffffffffffff8d] # 0x00000000024f0130 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@46 (line 60) ; {section_word} 0x00000000024f01a3: jmp 0x00000000024f01cb 0x00000000024f01a5: cmp edx,0x5 0x00000000024f01a8: je 0x00000000024f01b9 0x00000000024f01aa: cmp edx,0x5 0x00000000024f01ad: jg 0x00000000024f0184 ;*tableswitch ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56) 0x00000000024f01af: mulsd xmm0,QWORD PTR [rip+0xffffffffffffff81] # 0x00000000024f0138 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@64 (line 66) ; {section_word} 0x00000000024f01b7: jmp 0x00000000024f01cb 0x00000000024f01b9: mulsd xmm0,QWORD PTR [rip+0xffffffffffffff67] # 0x00000000024f0128 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@70 (line 68) ; {section_word} 0x00000000024f01c1: jmp 0x00000000024f01cb 0x00000000024f01c3: mulsd xmm0,QWORD PTR [rip+0xffffffffffffff55] # 0x00000000024f0120 ;*tableswitch ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56) ; {section_word} 0x00000000024f01cb: add rsp,0x10 0x00000000024f01cf: pop rbp 0x00000000024f01d0: test DWORD PTR [rip+0xfffffffffdf3fe2a],eax # 0x0000000000430000 ; {poll_return} 0x00000000024f01d6: ret
在 18 种情况下,程序集看起来像这样(注意使用的指针数组并抑制了所有比较的需要:jmp QWORD PTR [r8+r10*1]直接跳转到右乘法)——这可能是性能改进的原因:
jmp QWORD PTR [r8+r10*1]
[Verified Entry Point] # {method} 'multiplyByPowerOfTen' '(DI)D' in 'javaapplication4/Test1' # parm0: xmm0:xmm0 = double # parm1: rdx = int # [sp+0x20] (sp of caller) 0x000000000287fe20: mov DWORD PTR [rsp-0x6000],eax ; {no_reloc} 0x000000000287fe27: push rbp 0x000000000287fe28: sub rsp,0x10 ;*synchronization entry ; - javaapplication4.Test1::multiplyByPowerOfTen@-1 (line 56) 0x000000000287fe2c: cmp edx,0x13 0x000000000287fe2f: jae 0x000000000287fe46 0x000000000287fe31: movsxd r10,edx 0x000000000287fe34: shl r10,0x3 0x000000000287fe38: movabs r8,0x287fd70 ; {section_word} 0x000000000287fe42: jmp QWORD PTR [r8+r10*1] ;*tableswitch ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56) 0x000000000287fe46: mov ebp,edx 0x000000000287fe48: mov edx,0x31 0x000000000287fe4d: xchg ax,ax 0x000000000287fe4f: call 0x00000000028590a0 ; OopMap{off=52} ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@202 (line 96) ; {runtime_call} 0x000000000287fe54: int3 ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@202 (line 96) 0x000000000287fe55: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe8b] # 0x000000000287fce8 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@194 (line 92) ; {section_word} 0x000000000287fe5d: jmp 0x000000000287ff16 0x000000000287fe62: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe86] # 0x000000000287fcf0 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@188 (line 90) ; {section_word} 0x000000000287fe6a: jmp 0x000000000287ff16 0x000000000287fe6f: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe81] # 0x000000000287fcf8 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@182 (line 88) ; {section_word} 0x000000000287fe77: jmp 0x000000000287ff16 0x000000000287fe7c: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe7c] # 0x000000000287fd00 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@176 (line 86) ; {section_word} 0x000000000287fe84: jmp 0x000000000287ff16 0x000000000287fe89: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe77] # 0x000000000287fd08 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@170 (line 84) ; {section_word} 0x000000000287fe91: jmp 0x000000000287ff16 0x000000000287fe96: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe72] # 0x000000000287fd10 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@164 (line 82) ; {section_word} 0x000000000287fe9e: jmp 0x000000000287ff16 0x000000000287fea0: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe70] # 0x000000000287fd18 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@158 (line 80) ; {section_word} 0x000000000287fea8: jmp 0x000000000287ff16 0x000000000287feaa: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe6e] # 0x000000000287fd20 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@152 (line 78) ; {section_word} 0x000000000287feb2: jmp 0x000000000287ff16 0x000000000287feb4: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe24] # 0x000000000287fce0 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@146 (line 76) ; {section_word} 0x000000000287febc: jmp 0x000000000287ff16 0x000000000287febe: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe6a] # 0x000000000287fd30 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@140 (line 74) ; {section_word} 0x000000000287fec6: jmp 0x000000000287ff16 0x000000000287fec8: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe68] # 0x000000000287fd38 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@134 (line 72) ; {section_word} 0x000000000287fed0: jmp 0x000000000287ff16 0x000000000287fed2: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe66] # 0x000000000287fd40 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@128 (line 70) ; {section_word} 0x000000000287feda: jmp 0x000000000287ff16 0x000000000287fedc: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe64] # 0x000000000287fd48 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@122 (line 68) ; {section_word} 0x000000000287fee4: jmp 0x000000000287ff16 0x000000000287fee6: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe62] # 0x000000000287fd50 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@116 (line 66) ; {section_word} 0x000000000287feee: jmp 0x000000000287ff16 0x000000000287fef0: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe60] # 0x000000000287fd58 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@110 (line 64) ; {section_word} 0x000000000287fef8: jmp 0x000000000287ff16 0x000000000287fefa: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe5e] # 0x000000000287fd60 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@104 (line 62) ; {section_word} 0x000000000287ff02: jmp 0x000000000287ff16 0x000000000287ff04: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe5c] # 0x000000000287fd68 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@98 (line 60) ; {section_word} 0x000000000287ff0c: jmp 0x000000000287ff16 0x000000000287ff0e: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe12] # 0x000000000287fd28 ;*tableswitch ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56) ; {section_word} 0x000000000287ff16: add rsp,0x10 0x000000000287ff1a: pop rbp 0x000000000287ff1b: test DWORD PTR [rip+0xfffffffffd9b00df],eax # 0x0000000000230000 ; {poll_return} 0x000000000287ff21: ret
最后,包含 30 个案例(下)的程序集看起来类似于 18 个案例,除了movapd xmm0,xmm1出现在代码中间的附加部分, 但是性能下降的最可能原因是方法太长期内联默认 JVM 设置:
movapd xmm0,xmm1
[Verified Entry Point] # {method} 'multiplyByPowerOfTen' '(DI)D' in 'javaapplication4/Test1' # parm0: xmm0:xmm0 = double # parm1: rdx = int # [sp+0x20] (sp of caller) 0x0000000002524560: mov DWORD PTR [rsp-0x6000],eax ; {no_reloc} 0x0000000002524567: push rbp 0x0000000002524568: sub rsp,0x10 ;*synchronization entry ; - javaapplication4.Test1::multiplyByPowerOfTen@-1 (line 56) 0x000000000252456c: movapd xmm1,xmm0 0x0000000002524570: cmp edx,0x1f 0x0000000002524573: jae 0x0000000002524592 ;*tableswitch ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56) 0x0000000002524575: movsxd r10,edx 0x0000000002524578: shl r10,0x3 0x000000000252457c: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe3c] # 0x00000000025243c0 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@364 (line 118) ; {section_word} 0x0000000002524584: movabs r8,0x2524450 ; {section_word} 0x000000000252458e: jmp QWORD PTR [r8+r10*1] ;*tableswitch ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56) 0x0000000002524592: mov ebp,edx 0x0000000002524594: mov edx,0x31 0x0000000002524599: xchg ax,ax 0x000000000252459b: call 0x00000000024f90a0 ; OopMap{off=64} ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@370 (line 120) ; {runtime_call} 0x00000000025245a0: int3 ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@370 (line 120) 0x00000000025245a1: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe27] # 0x00000000025243d0 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@358 (line 116) ; {section_word} 0x00000000025245a9: jmp 0x0000000002524744 0x00000000025245ae: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe22] # 0x00000000025243d8 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@348 (line 114) ; {section_word} 0x00000000025245b6: jmp 0x0000000002524744 0x00000000025245bb: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe1d] # 0x00000000025243e0 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@338 (line 112) ; {section_word} 0x00000000025245c3: jmp 0x0000000002524744 0x00000000025245c8: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe18] # 0x00000000025243e8 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@328 (line 110) ; {section_word} 0x00000000025245d0: jmp 0x0000000002524744 0x00000000025245d5: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe13] # 0x00000000025243f0 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@318 (line 108) ; {section_word} 0x00000000025245dd: jmp 0x0000000002524744 0x00000000025245e2: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe0e] # 0x00000000025243f8 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@308 (line 106) ; {section_word} 0x00000000025245ea: jmp 0x0000000002524744 0x00000000025245ef: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe09] # 0x0000000002524400 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@298 (line 104) ; {section_word} 0x00000000025245f7: jmp 0x0000000002524744 0x00000000025245fc: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe04] # 0x0000000002524408 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@288 (line 102) ; {section_word} 0x0000000002524604: jmp 0x0000000002524744 0x0000000002524609: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffdff] # 0x0000000002524410 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@278 (line 100) ; {section_word} 0x0000000002524611: jmp 0x0000000002524744 0x0000000002524616: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffdfa] # 0x0000000002524418 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@268 (line 98) ; {section_word} 0x000000000252461e: jmp 0x0000000002524744 0x0000000002524623: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffd9d] # 0x00000000025243c8 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@258 (line 96) ; {section_word} 0x000000000252462b: jmp 0x0000000002524744 0x0000000002524630: movapd xmm0,xmm1 0x0000000002524634: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe0c] # 0x0000000002524448 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@242 (line 92) ; {section_word} 0x000000000252463c: jmp 0x0000000002524744 0x0000000002524641: movapd xmm0,xmm1 0x0000000002524645: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffddb] # 0x0000000002524428 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@236 (line 90) ; {section_word} 0x000000000252464d: jmp 0x0000000002524744 0x0000000002524652: movapd xmm0,xmm1 0x0000000002524656: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffdd2] # 0x0000000002524430 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@230 (line 88) ; {section_word} 0x000000000252465e: jmp 0x0000000002524744 0x0000000002524663: movapd xmm0,xmm1 0x0000000002524667: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffdc9] # 0x0000000002524438 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@224 (line 86) ; {section_word} [etc.] 0x0000000002524744: add rsp,0x10 0x0000000002524748: pop rbp 0x0000000002524749: test DWORD PTR [rip+0xfffffffffde1b8b1],eax # 0x0000000000340000 ; {poll_return} 0x000000000252474f: ret