Deep neural network (DNN) inference relies increasingly on specialized hardware for high computational efficiency. This work introduces a field-programmable gate array (FPGA)-based dynamically configurable accelerator featuring systolic arrays, high-bandwidth memory, and UltraRAMs. We present two processing unit (PU) configurations with different computing capabilities using the same interfaces and peripheral blocks. By instantiating multiple PUs and employing a heuristic weight transfer schedule, the architecture achieves notable throughput efficiency over prior works. Moreover, we outline how the architecture can be extended to emulate analog in-memory computing (AIMC) devices to aid next-generation heterogeneous AIMC chip designs and investigate device-level noise behavior. Overall, this brief presents a versatile DNN inference acceleration architecture adaptable to various models and future FPGA designs.
翻译:深度神经网络推理日益依赖专用硬件以实现高计算效率。本研究提出一种基于现场可编程门阵列的动态可配置加速器,其具备脉动阵列、高带宽内存和UltraRAM。我们展示了两种采用相同接口和外围模块但计算能力不同的处理单元配置方案。通过实例化多个处理单元并采用启发式权重传输调度策略,该架构在吞吐效率方面显著优于先前工作。此外,我们阐述了该架构如何扩展以模拟模拟内存计算设备,从而助力下一代异构AIMC芯片设计,并探究器件级噪声行为。总体而言,本简报提出了一种适用于多种模型及未来FPGA设计的通用DNN推理加速架构。