Granlund and Montgomery proposed an optimization method for unsigned integer division by constants [3]. Their method (called the GM method in this paper) was further improved in part by works such as [1] and [7], and is now adopted by major compilers including GCC, Clang, Microsoft Compiler, and Apple Clang. However, for example, for x/7, the generated code is designed for 32-bit CPUs and therefore does not fully exploit 64-bit capabilities. This paper proposes an optimization method for 32-bit unsigned division by constants targeting 64-bit CPUs. We implemented patches for LLVM/GCC and achieved speedups of 1.67x on Intel Xeon w9-3495X (Sapphire Rapids) and 1.98x on Apple M4 (Apple M-series SoC) in the microbenchmark described later. The LLVM patch has already been merged into llvm:main [6], demonstrating the practical applicability of the proposed method.
翻译:Granlund和Montgomery提出了一种针对无符号整数常量除法的优化方法[3]。该方法(本文称之为GM方法)经由[1]和[7]等工作的部分改进,现已被GCC、Clang、Microsoft Compiler及Apple Clang等主流编译器所采用。然而,以x/7为例,生成的代码仍针对32位CPU设计,未能充分利用64位能力。本文提出一种面向64位CPU的32位无符号常量除法优化方法。我们为LLVM/GCC实现了补丁,并在后续微基准测试中,于Intel Xeon w9-3495X(Sapphire Rapids)上实现了1.67倍加速,在Apple M4(Apple M系列SoC)上实现了1.98倍加速。该LLVM补丁已合并至llvm:main[6],证明了所提出方法的实际应用价值。