The accuracy requirements in many scientific computing workloads result in the use of double-precision floating-point arithmetic in the execution kernels. Nevertheless, emerging real-number representations, such as posit arithmetic, show promise in delivering even higher accuracy in such computations. In this work, we explore the native use of 64-bit posits in a series of numerical benchmarks and compare their timing performance, accuracy and hardware cost to IEEE 754 doubles. In addition, we also study the conjugate gradient method for numerically solving systems of linear equations in real-world applications. For this, we extend the PERCIVAL RISC-V core and the Xposit custom RISC-V extension with posit64 and quire operations. Results show that posit64 can obtain up to 4 orders of magnitude lower mean square error than doubles. This leads to a reduction in the number of iterations required for convergence in some iterative solvers. However, leveraging the quire accumulator register can limit the order of some operations such as matrix multiplications. Furthermore, detailed FPGA and ASIC synthesis results highlight the significant hardware cost of 64-bit posit arithmetic and quire. Despite this, the large accuracy improvements achieved with the same memory bandwidth suggest that posit arithmetic may provide a potential alternative representation for scientific computing.
翻译:摘要:许多科学计算工作负载对精度的要求导致执行内核中采用双精度浮点算术。然而,诸如posit算术等新兴实数表示方法,有望在此类计算中提供更高精度。本研究在一系列数值基准测试中探索了64位posits的原生应用,并将其时序性能、精度和硬件成本与IEEE 754双精度浮点数进行对比。此外,我们还研究了实际应用中用于数值求解线性方程组的共轭梯度法。为此,我们扩展了PERCIVAL RISC-V内核和Xposit自定义RISC-V扩展,加入了posit64和quire操作。结果表明,posit64的均方误差可比双精度浮点数低多达4个数量级。这减少了某些迭代求解器中收敛所需的迭代次数。然而,利用quire累加器寄存器可能限制某些操作(如矩阵乘法)的顺序。此外,详细的FPGA和ASIC综合结果突出了64位posit算术和quire的显著硬件成本。尽管如此,在相同内存带宽下实现的巨大精度提升表明,posit算术可能为科学计算提供一种潜在的替代表示方法。