High-performance deep neural network (DNN)-based systems are in high demand in edge environments. Due to its high computational complexity, it is challenging to deploy DNNs on edge devices with strict limitations on computational resources. In this paper, we derive a compact while highly-accurate DNN model, termed dsODENet, by combining recently-proposed parameter reduction techniques: Neural ODE (Ordinary Differential Equation) and DSC (Depthwise Separable Convolution). Neural ODE exploits a similarity between ResNet and ODE, and shares most of weight parameters among multiple layers, which greatly reduces the memory consumption. We apply dsODENet to a domain adaptation as a practical use case with image classification datasets. We also propose a resource-efficient FPGA-based design for dsODENet, where all the parameters and feature maps except for pre- and post-processing layers can be mapped onto on-chip memories. It is implemented on Xilinx ZCU104 board and evaluated in terms of domain adaptation accuracy, inference speed, FPGA resource utilization, and speedup rate compared to a software counterpart. The results demonstrate that dsODENet achieves comparable or slightly better domain adaptation accuracy compared to our baseline Neural ODE implementation, while the total parameter size without pre- and post-processing layers is reduced by 54.2% to 79.8%. Our FPGA implementation accelerates the inference speed by 23.8 times.
翻译:高性能深度神经网络系统在边缘环境中有着迫切需求。由于其高计算复杂度,在计算资源严格受限的边缘设备上部署深度神经网络具有挑战性。本文通过结合近期提出的参数缩减技术——神经常微分方程与深度可分离卷积,推导出一个紧凑且高精度的深度神经网络模型,称为dsODENet。神经常微分方程利用ResNet与常微分方程之间的相似性,在多层之间共享大部分权重参数,从而大幅减少内存消耗。我们将dsODENet应用于域适应这一实际场景,采用图像分类数据集进行验证。同时,我们提出一种资源高效的基于FPGA的dsODENet设计,其中除预处理和后处理层之外的所有参数与特征图均可映射到片内存储器。该设计在Xilinx ZCU104开发板上实现,并从域适应精度、推理速度、FPGA资源利用率及相较于软件实现的加速比等方面进行评估。结果表明,dsODENet在域适应精度上达到与基线神经常微分方程实现相当或略优的性能,同时去除预处理和后处理层后的总参数量减少54.2%至79.8%。我们的FPGA实现将推理速度加速了23.8倍。