As IoT and edge inference proliferate,there is a growing need to simultaneously optimize area and delay in lookup-table (LUT)-based multipliers that implement large numbers of low-bitwidth operations in parallel. This paper proposes a hardwareefficientaccurate 4-bit multiplier design for AMD Xilinx 7-series FPGAs using only 11 LUTs and two CARRY4 blocks. By reorganizing the logic functions mapped to the LUTs, the proposed method reduces the LUT count by one compared with the prior 12-LUT design while also shortening the critical path. Evaluation confirms that the circuit attains minimal resource usage and a critical-path delay of 2.750 ns.
翻译:随着物联网和边缘推理的普及,对于在基于查找表(LUT)的乘法器中并行实现大量低比特位运算时,同时优化面积和延迟的需求日益增长。本文提出了一种面向AMD Xilinx 7系列FPGA的硬件高效型精确4位乘法器设计,仅使用11个LUT和两个CARRY4模块。通过重组映射到LUT的逻辑功能,所提方法相较于先前12-LUT的设计减少了一个LUT的使用,同时缩短了关键路径。评估结果证实,该电路实现了最小的资源占用和2.750 ns的关键路径延迟。