Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have explored the quantization technologies in related lightweight accelerator designs to reduce hardware resource consumption. However, low precision leads to high accuracy loss in inference. Therefore, mixed-precision quantization becomes an alternative solution by applying different precision in different layers to trade off resource consumption and accuracy. Because regular designs for multiplication on hardware cannot support the precision reconfiguration for a multi-precision Quantized Neural Network (QNN) model in runtime, we propose a runtime reconfigurable multi-precision multi-channel bitwise systolic array design for QNN accelerators. We have implemented and evaluated our work on the Ultra96 FPGA platform. Results show that our work can achieve 1.3185 to 3.5671 times speedup in inferring mixed-precision models and has less critical path delay, supporting a higher clock frequency (250MHz).
翻译:神经网络加速器已广泛应用于边缘设备,以执行目标跟踪、图像识别等复杂任务。先前的研究已在相关轻量化加速器设计中探索量化技术,以降低硬件资源消耗。然而,低精度会导致推理过程中的高精度损失。因此,混合精度量化成为一种替代解决方案,通过在不同层应用不同精度来权衡资源消耗与准确性。由于硬件上常规的乘法设计无法在运行时支持多精度量化神经网络模型所需的精度重配置,我们提出了一种面向QNN加速器的运行时可重配置多精度多通道比特级脉动阵列设计。我们已在Ultra96 FPGA平台上实现并评估了本工作。结果表明,本设计在推理混合精度模型时能实现1.3185至3.5671倍的加速比,且关键路径延迟更低,支持更高时钟频率(250MHz)。