Reducing inference time and energy usage while maintaining prediction accuracy has become a significant concern for deep neural networks (DNN) inference on resource-constrained edge devices. To address this problem, we propose a novel approach based on "converting" autoencoder and lightweight DNNs. This improves upon recent work such as early-exiting framework and DNN partitioning. Early-exiting frameworks spend different amounts of computation power for different input data depending upon their complexity. However, they can be inefficient in real-world scenarios that deal with many hard image samples. On the other hand, DNN partitioning algorithms that utilize the computation power of both the cloud and edge devices can be affected by network delays and intermittent connections between the cloud and the edge. We present CBNet, a low-latency and energy-efficient DNN inference framework tailored for edge devices. It utilizes a "converting" autoencoder to efficiently transform hard images into easy ones, which are subsequently processed by a lightweight DNN for inference. To the best of our knowledge, such autoencoder has not been proposed earlier. Our experimental results using three popular image-classification datasets on a Raspberry Pi 4, a Google Cloud instance, and an instance with Nvidia Tesla K80 GPU show that CBNet achieves up to 4.8x speedup in inference latency and 79% reduction in energy usage compared to competing techniques while maintaining similar or higher accuracy.
翻译:在资源受限的边缘设备上进行深度神经网络推理时,如何在保持预测精度的同时降低推理时间和能耗已成为关键问题。为此,我们提出基于"转换"自编码器与轻量级DNN的新方法。该方法在早期退出框架和DNN分区等近期研究基础上进行改进:早期退出框架虽能根据输入数据复杂度分配不同计算资源,但在处理大量复杂图像样本的真实场景中效率不足;而利用云边协同算力的DNN分区算法则受限于网络延迟与连接不稳定性。我们提出CBNet——专为边缘设备设计的低延迟高能效DNN推理框架,通过"转换"自编码器将复杂图像高效转化为简单图像,再由轻量级DNN完成推理。据我们所知,此类自编码器系首次提出。在Raspberry Pi 4、Google Cloud实例及配备Nvidia Tesla K80 GPU的实例上,基于三个主流图像分类数据集的实验表明:与现有技术相比,CBNet在保持相近或更高精度的同时,推理延迟最高加速4.8倍,能耗降低79%。