Neural-network (NN) inference is increasingly present on-board spacecraft to reduce downlink bandwidth and enable timely decision making. However, the power and reliability constraints of space missions limit the applicability of many state-of-the-art NN accelerators. This paper presents bitSMM, a bit-serial matrix multiplication accelerator built around a systolic array of bit-serial multiply--accumulate (MAC) units. The design supports runtime-configurable operand precision from 1 to 16 bits and evaluates two MAC variants: a Booth-inspired architecture and a standard binary multiplication with correction architecture. We implement bitSMM in [System]Verilog and evaluate it on an AMD ZCU104 FPGA and through ASIC physical implementation using the asap7 and nangate45 process design kits. On the FPGA, bitSMM achieves up to 19.2~GOPS and 2.973~GOPS/W, and in asap7 it achieves up to 73.22~GOPS, 552~GOPS/mm$^2$, and 40.8~GOPS/W.
翻译:神经网络(NN)推理越来越多地部署在航天器上,以减少下行链路带宽并实现及时决策。然而,航天任务对功耗和可靠性的限制使得许多最先进的神经网络加速器难以适用。本文提出bitSMM,一种围绕位串行乘累加(MAC)单元脉动阵列构建的位串行矩阵乘法加速器。该设计支持运行时可配置的1至16位操作数精度,并评估了两种MAC变体:一种受布斯算法启发的架构和一种带校正的标准二进制乘法架构。我们在[System]Verilog中实现了bitSMM,并在AMD ZCU104 FPGA上进行了评估,同时使用asap7和nangate45工艺设计套件进行了ASIC物理实现评估。在FPGA上,bitSMM实现了高达19.2 GOPS的运算性能和2.973 GOPS/W的能效;在asap7工艺下,则实现了高达73.22 GOPS的运算性能、552 GOPS/mm²的面积效率以及40.8 GOPS/W的能效。