This paper presents BlendNet, a neural network architecture employing a novel building block called Blend module, which relies on performing binary and fixed-point convolutions in its main and skip paths, respectively. There is a judicious deployment of batch normalizations on both main and skip paths inside the Blend module and in between consecutive Blend modules. This paper also presents a compiler for mapping various BlendNet models obtained by replacing some blocks/modules in various vision neural network models with BlendNet modules to FPGA devices with the goal of minimizing the end-to-end inference latency while achieving high output accuracy. BlendNet-20, derived from ResNet-20 trained on the CIFAR-10 dataset, achieves 88.0% classification accuracy (0.8% higher than the state-of-the-art binary neural network) while it only takes 0.38ms to process each image (1.4x faster than state-of-the-art). Similarly, our BlendMixer model trained on the CIFAR-10 dataset achieves 90.6% accuracy (1.59% less than full precision MLPMixer) while achieving a 3.5x reduction in the model size. Moreover, The reconfigurability of DSP blocks for performing 48-bit bitwise logic operations is utilized to achieve low-power FPGA implementation. Our measurements show that the proposed implementation yields 2.5x lower power consumption.
翻译:本文提出BlendNet,一种采用新型构建模块——Blend模块的神经网络架构。该模块的主路径执行二值化卷积,跳跃路径执行定点卷积。在Blend模块内部及相邻Blend模块之间,批量归一化被巧妙部署。本文还提出一种编译器,可将通过替换视觉神经网络模型中某些模块为BlendNet模块所得的各种BlendNet模型映射至FPGA设备,目标是最小化端到端推理延迟的同时实现高输出精度。基于CIFAR-10数据集训练的ResNet-20衍生模型BlendNet-20,在仅需0.38毫秒处理每张图像(比现有最优方法快1.4倍)的条件下,达到88.0%的分类准确率(比现有最优二值化神经网络高0.8%)。类似地,基于CIFAR-10数据集训练的BlendMixer模型在实现模型尺寸缩小3.5倍的同时,达到90.6%的准确率(比全精度MLPMixer低1.59%)。此外,通过利用DSP模块可重构执行48位按位逻辑运算的特性,实现了低功耗FPGA实现。测量结果显示,所提实现方案功耗降低2.5倍。