The major challenge when designing multipliers for FPGAs is to address several trade-offs: On the one hand at the performance level and on the other hand at the resource level utilizing DSP blocks or look-up tables (LUTs). With DSPs being a relatively limited resource, the problem of under- or over-utilization of DSPs has previously been addressed by the concept of multiplier tiling, by assembling multipliers from DSPs and small supplemental LUT multipliers. But there had always been an efficiency gap between tiling-based multipliers and radix-4 Booth-Arrays. While the monolithic Booth-Array was shown to be considerably more efficient in terms of LUT-resources on many modern FPGA-architectures, it typically possess a significantly higher critically path delay (or latency when pipelined) compared to multipliers designed by tiling. This work proposes and analyzes the use of smaller Booth-Arrays as sub-multipliers that are integrated into existing tiling-based methods, such that better trade-off points between area and delay can be reached while utilizing a user-specified number of DSP blocks. It is shown by synthesis experiments, that the critical path delay compared to large Booth-Arrays can be reduced, while achieving significant reductions in LUT-resources compared to previous tiling.
翻译:在FPGA上设计乘法器时面临的主要挑战在于平衡多方面的权衡:一方面是性能层面的考量,另一方面是在资源层面如何利用DSP模块或查找表(LUT)。由于DSP属于相对有限的资源,先前已通过乘法器平铺的概念——即组合DSP与小型辅助LUT乘法器来构建乘法器——解决了DSP资源利用不足或过度利用的问题。然而,基于平铺的乘法器与基4布斯阵列之间始终存在效率差距。虽然研究表明,在许多现代FPGA架构上,单体布斯阵列在LUT资源使用效率方面显著更高,但相较于通过平铺设计的乘法器,其通常具有明显更长的关键路径延迟(或在流水线化情况下的延迟)。本研究提出并分析了将小型布斯阵列作为子乘法器集成到现有平铺方法中的方案,从而在利用用户指定数量DSP模块的同时,实现面积与延迟之间更优的权衡点。综合实验表明,相较于大型布斯阵列,该方案能降低关键路径延迟,同时相比先前的平铺方法显著减少了LUT资源使用。