In the literature on algorithms for performing the multi-term addition $s_n=\sum_{i=1}^n x_i$ using floating-point arithmetic it is often shown that a hardware unit that has single normalization and rounding improves precision, area, latency, and power consumption, compared with the use of standard add or fused multiply-add units. However, non-monotonicity can appear when computing sums with a subclass of multi-term addition units, which currently is not explored in the literature. We demonstrate that common techniques for performing multi-term addition with $n\geq 4$, without normalization of intermediate quantities, can result in non-monotonicity -- increasing one of the addends $x_i$ decreases the sum $s_n$. Summation is required in dot product and matrix multiplication operations, operations that have increasingly started appearing in the hardware of supercomputers, thus knowing where monotonicity is preserved can be of interest to the users of these machines. Our results suggest that non-monotonicity of summation, in some of the commercial hardware devices that implement a specific class of multi-term adders, is a feature that may have appeared unintentionally as a consequence of design choices that reduce circuit area and other metrics. To demonstrate our findings, we use formal proofs as well as a numerical simulation of non-monotonic multi-term adders in MATLAB.
翻译:关于使用浮点算术实现多术语加法 $s_n=\sum_{i=1}^n x_i$ 的算法文献通常表明,与使用标准加法器或融合乘加单元相比,具有单次归一化和舍入功能的硬件单元在精度、面积、延迟和功耗方面均有改进。然而,当使用多术语加法单元的子类计算求和时,可能出现非单调性,这一问题目前尚未在文献中得到探讨。我们证明,对于 $n\geq 4$ 的多术语加法,若不对中间量进行归一化,常用技术可能导致非单调性——增大其中一个加数 $x_i$ 反而会减小和 $s_n$。求和运算是点积和矩阵乘法操作的基础,这些操作已日益广泛应用于超级计算机的硬件中,因此了解单调性得以保持的条件对用户具有重要价值。我们的结果表明,在实现特定类多术语加法器的某些商用硬件设备中,求和运算的非单调性可能是为减少电路面积及其他指标而进行的设计选择无意中引入的特性。为验证发现,我们采用形式化证明以及在 MATLAB 中对非单调多术语加法器进行数值模拟的方法。