Due to edge device resource constraints and different characteristics of deep neural network (DNN) models, it is a big challenge to optimize DNN inference performance in terms of energy consumption and inference latency on edge devices. In addition to the dynamic voltage frequency scaling (DVFS) technique, the edge-cloud architecture provides a collaborative approach to efficient DNN inference. However, current edge-cloud collaborative inference methods have not optimized various compute resources on edge devices. Thus, we propose DVFO, a novel DVFS-enabled edge-cloud collaborative inference framework, which jointly optimize DVFS and offloading parameters via deep reinforcement learning (DRL). Specifically, DVFO automatically co-optimizes 1) CPU, GPU and memory frequencies of edge devices, and 2) feature maps to be offloaded to cloud servers. In addition, it leverages a thinking-while-moving concurrent mechanism to accelerate the DRL learning process, and a spatialchannel attention mechanism to extract DNN feature maps of secondary importance for workload offloading. This approach improves energy efficiency and inference latency for different DNN models under various edge-cloud network conditions. Experimental results on different datasets show that DVFO reduces the average energy consumption by 33% compared to state-of-the-art schemes. Moreover, DVFO achieves up to 54% end-to-end inference latency reduction.
翻译:由于边缘设备资源受限以及深度神经网络(DNN)模型的不同特性,在边缘设备上优化DNN推理的能耗和推理延迟已成为一项重大挑战。除动态电压频率缩放(DVFS)技术外,边缘-云架构为高效DNN推理提供了一种协作方法。然而,现有的边缘-云协作推理方法未能对边缘设备上的各类计算资源进行全面优化。为此,我们提出DVFO——一种新型支持DVFS的边缘-云协作推理框架,该框架通过深度强化学习(DRL)联合优化DVFS与卸载参数。具体而言,DVFO自动协同优化:1) 边缘设备的CPU、GPU与内存频率;2) 需卸载至云服务器的特征图。此外,框架采用"边思考边移动"并发机制加速DRL学习过程,并利用空间-通道注意力机制提取DNN中次重要特征图用于工作负载卸载。该方法能在不同边缘-云网络条件下,提升不同DNN模型的能效并降低推理延迟。基于不同数据集的实验结果表明,与现有最优方案相比,DVFO使平均能耗降低33%,端到端推理延迟最高减少54%。