Fast combinational multipliers with large bit widths can occupy significant silicon area. Provided the application allows for a multiplication to last two or more clock cycles, the area can be reduced through resource sharing (i.e., folding). This work introduces multiple architectures and parameterized Verilog circuit generators for Multi-Cycle folded Integer Multiplier (MCIM) designs, which are based on Schoolbook and Karatsuba approaches. When implementing an application in hardware, it is possible that a fractional number of multiplications is performed per cycle on average, such as 3.5. In such a case, we can use 3 single-cycle multipliers plus an additional smaller multiplier with a ThroughPut (TP) of 0.5. Our MCIM designs offer customization in terms of TP, latency, and clock frequency. The MCIM idea is for a TP of $1/n$, where $n$ is an integer and $n \geq 2$. All proposed designs were synthesized and verified for various bit widths using scripts. ASIC synthesis results show that MCIM designs with a TP of 1/2 offer area savings of 21% to 48% for bit widths of 8 to 128, with respect to synthesizing the * operator. Additionally, MCIM designs can offer up to 33% energy savings and 84% average peak power reduction.
翻译:快速组合乘法器在大位宽下会占用大量硅片面积。若应用允许乘法运算持续两个或更多时钟周期,可通过资源共享(即折叠)减小面积。本文提出了基于Schoolbook和Karatsuba方法的多周期折叠整数乘法器(MCIM)的多种架构及参数化Verilog电路生成器。当硬件实现应用时,平均每周期可能执行非整数次乘法(如3.5次),此时可使用3个单周期乘法器加一个吞吐量(TP)为0.5的额外小型乘法器。我们的MCIM设计在吞吐量、延迟和时钟频率方面提供定制化选择。MCIM核心思想适用于$1/n$的吞吐量,其中$n$为整数且$n \geq 2$。所有设计均通过脚本实现并验证了不同位宽的合成结果。ASIC综合结果显示,与直接综合*运算符相比,吞吐量为1/2的MCIM设计在8至128位宽下节省21%至48%面积。此外,MCIM设计还能实现高达33%的能效提升和84%的平均峰值功耗降低。