Summing a set of numbers, namely, "Accumulation," is a subtask within many computational tasks. If the numbers to sum arrive non-stop in back-to-back clock cycles at high clock frequencies, summing them without allowing them to pile up can be quite a challenge, that is, when the latency of addition (i.e., summing two numbers) is longer than one clock cycle, which is always the case for floating-point numbers. This could also be the case for integer summations with high clock frequencies. In the case of floating-point numbers, this is handled by pipelining the adder, but that does not solve all problems. The challenges include optimization of speed, area, and latency. As well as the adaptability of the design to different application requirements, such as the ability to handle variable-size subsequent data sets with no time gap in between and with results produced in the input-order. All these factors make designing an efficient floating-point accumulator a non-trivial problem. Integer accumulation is a relatively simpler problem, where high frequencies can be achieved by using carry-save tree adders. This can then be further improved by efficient resource-sharing. In this paper, we present two fast and area-efficient accumulation circuits, JugglePAC and INTAC. JugglePAC is tailored for floating-point reduction operations (such as accumulation) and offers significant advantages with respect to the literature in terms of speed, area, and adaptability to various application requirements. INTAC is designed for fast integer accumulation. Using carry-save adders and resource-sharing, it can achieve very high clock frequencies while maintaining a low area complexity.
翻译:对一组数字求和(即“累加”)是许多计算任务中的子任务。若待求和数字以高时钟频率、在连续时钟周期内无间断到达,且加法延迟(即两数相加)超过一个时钟周期时(浮点数运算中始终如此),则防止数字堆积构成重大挑战。在高时钟频率的整数求和场景中亦可能出现此问题。针对浮点数,采用流水线加法器虽能部分缓解,但并未彻底解决所有难题。挑战包括速度、面积和延迟的优化,以及设计对不同应用需求的适应性,例如:能无时间间隔地处理后续可变大小的数据集,并按输入顺序生成结果。这些因素使得设计高效的浮点数累加器成为一项非平凡任务。整数累加相对简单,通过使用进位保存树加法器可实现高频率,再通过高效资源共享进一步优化。本文提出两种高速且面积高效的累加电路:JugglePAC与INTAC。JugglePAC专为浮点数归约运算(如累加)设计,在速度、面积及对不同应用需求的适应性方面较现有文献具有显著优势。INTAC专为高速整数累加设计,通过采用进位保存加法器和资源共享,可在保持低面积复杂度的同时实现极高的时钟频率。