In the literature on algorithms for performing the multi-term addition $s_n=\sum_{i=1}^n x_i$ using floating-point arithmetic it is often shown that a hardware unit that has single normalization and rounding improves precision, area, latency, and power consumption, compared with the use of standard add or fused multiply-add units. However, non-monotonicity can appear when computing sums with a subclass of multi-term addition units, which currently is not explored in the literature. We demonstrate that common techniques for performing multi-term addition with $n\geq 4$, without normalization of intermediate quantities, can result in non-monotonicity -- increasing one of the addends $x_i$ decreases the sum $s_n$. Summation is required in dot product and matrix multiplication operations, operations that have increasingly started appearing in the hardware of supercomputers, thus knowing where monotonicity is preserved can be of interest to the users of these machines. Our results suggest that non-monotonicity of summation, in some of the commercial hardware devices that implement a specific class of multi-term adders, is a feature that may have appeared unintentionally as a consequence of design choices that reduce circuit area and other metrics. To demonstrate our findings, we use formal proofs as well as a numerical simulation of non-monotonic multi-term adders in MATLAB.
翻译:在关于使用浮点算术执行多术语加法 $s_n=\sum_{i=1}^n x_i$ 的算法文献中,通常表明,与使用标准加法器或融合乘加单元相比,具有单次归一化和舍入功能的硬件单元能提高精度、面积、延迟和功耗。然而,当使用多术语加法单元的一个子类计算求和时,可能出现非单调性,目前文献中对此尚未探讨。我们证明,对于 $n\geq 4$ 的多术语加法,若不进行中间量的归一化,常见技术可能导致非单调性——增加其中一个加数 $x_i$ 反而会减少和 $s_n$。求和运算是点积和矩阵乘法操作中的核心,这些操作已日益出现在超级计算机的硬件中,因此了解单调性在何处得以保持对用户而言具有意义。我们的结果表明,在实现特定多术语加法器类别的某些商用硬件设备中,求和的非单调性可能是设计选择(旨在减少电路面积及其他指标)所无意引入的特征。为展示我们的发现,我们使用形式化证明以及MATLAB中对非单调多术语加法器的数值模拟进行验证。