Large-number arithmetic, widely used in scientific computing and cryptography, has seen limited adoption of single instruction, multiple data (SIMD) parallelism on modern CPUs due to the inherent dependencies in traditional algorithms. We present DigitsOnTurbo (DoT), which restructures the computation around independent, data-parallel operations, rather than vectorizing the standard algorithms, thereby leveraging the benefits provided by SIMD. Over prior SIMD implementations, DoT achieves up to 1.85x speedups for addition and subtraction, and 2.3x for multiplication. When integrated into state-of-the-art libraries, DoT yields up to 4x speedup for addition and subtraction, and up to 2x speedup for multiplication, cascading into end-to-end throughput gains of up to 19.3% for scientific computations, and up to 7.9% latency and 5.9% throughput improvements on cryptographic implementations.
翻译:大数运算广泛应用于科学计算与密码学领域,但由于传统算法固有的依赖关系,在当代CPU上单指令多数据(SIMD)并行机制的采用仍十分有限。我们提出DigitsOnTurbo(DoT),该方法将计算重构为独立的数据并行操作,而非对标准算法进行向量化,从而充分利用SIMD的优势。相较先前的SIMD实现,DoT在加法与减法运算中实现最高1.85倍加速,在乘法运算中实现最高2.3倍加速。当集成至先进算法库时,DoT在加/减法中实现最高4倍加速,在乘法中实现最高2倍加速,进而为科学计算带来最高19.3%的端到端吞吐量提升,并为密码学实现带来最高7.9%的延迟降低与5.9%的吞吐量改进。