Fast combinational multipliers with large bit widths can occupy significant silicon area, which also drives up power consumption. Area can be reduced through resource sharing (i.e., folding) at the expense of lower throughput, which is acceptable for some applications. This work explores multiple architectures for Multi-Cycle folded Integer Multiplier (MCIM) designs, which are based on Schoolbook and Karatsuba approaches. Applications sometimes require a fractional number of multiplications to be performed per cycle. For example, an algorithm may only require 3.5 multiplications per cycle. In such a case, 3 multipliers with a throughput of 1 plus an additional smaller multiplier with a throughput of $1/2$ would be sufficient to maintain the algorithm's throughput. Our MCIM design generator offers customization in terms of throughput, latency, and clock frequency. MCIM designs were synthesized and verified for various parameter values using scripts. ASIC synthesis results show that MCIM designs with a throughput of $1/2$ offer area savings of up to 44% for bit widths of 8 to 128 with respect to directly synthesizing the * operator. Additionally, MCIM designs can offer up to 33% energy savings and 65% average peak power reduction.
翻译:具有大位宽的快速组合乘法器会占用大量硅片面积,这也会增加功耗。通过资源共享(即折叠)可以减少面积,但代价是降低吞吐量,这对于某些应用是可接受的。本研究探索了基于教科书乘法与卡拉楚巴算法的多周期折叠整数乘法器(MCIM)设计的多种架构。应用有时要求每个周期执行分数次乘法运算。例如,某个算法可能仅需每个周期执行3.5次乘法。在这种情况下,配置3个吞吐量为1的乘法器加上1个吞吐量为$1/2$的较小乘法器,便足以维持算法的吞吐量。我们的MCIM设计生成器支持在吞吐量、延迟和时钟频率方面进行定制。通过脚本对多种参数值下的MCIM设计进行了综合与验证。ASIC综合结果表明,对于8至128位宽,吞吐量为$1/2$的MCIM设计相较于直接综合*运算符,面积节省最高可达44%。此外,MCIM设计最高可节省33%的能耗,并降低65%的平均峰值功耗。