Modern computer architectures support low-precision arithmetic, which present opportunities for the adoption of mixed-precision algorithms to achieve high computational throughput and reduce energy consumption. As a growing number of scientific computations leverage specialized hardware accelerators, the risk of rounding errors increases, potentially compromising the reliability of models. This shift towards hardware-optimized, low-precision computations highlights the importance of rounding error analysis to ensure that performance gains do not come at the expense of accuracy, especially in high-stakes scientific applications. In this work, we conduct rounding error analysis on widely used operations such as fused multiply-add (FMA), mixed-precision FMA (MPFMA), and NVIDIA Tensor cores. We present a deterministic and probabilistic approach to quantifying the accumulated rounding errors. Numerical experiments are presented to perform the multiply and accumulate operation (MAC) and matrix-matrix multiplication using Tensor cores with random data. We show that probabilistic bounds produce tighter estimates by nearly an order of magnitude compared to deterministic ones for matrix-matrix multiplication.
翻译:现代计算机架构支持低精度算术,这为采用混合精度算法以实现高计算吞吐量和降低能耗提供了机遇。随着越来越多的科学计算利用专用硬件加速器,舍入误差的风险随之增加,可能损害模型的可靠性。这种向硬件优化、低精度计算的转变凸显了舍入误差分析的重要性,以确保性能提升不以牺牲精度为代价,尤其是在高风险的科学应用中。在本工作中,我们对广泛使用的操作如融合乘加(FMA)、混合精度FMA(MPFMA)以及NVIDIA Tensor核心进行舍入误差分析。我们提出了一种确定性与概率性相结合的方法来量化累积舍入误差。数值实验展示了使用Tensor核心对随机数据执行乘累加运算(MAC)和矩阵-矩阵乘法。我们证明,对于矩阵-矩阵乘法,概率性边界相比确定性边界能产生紧近一个数量级的更严格估计。