Due to limited resources on edge and different characteristics of deep neural network (DNN) models, it is a big challenge to optimize DNN inference performance in terms of energy consumption and end-to-end latency on edge devices. In addition to the dynamic voltage frequency scaling (DVFS) technique, the edge-cloud architecture provides a collaborative approach for efficient DNN inference. However, current edge-cloud collaborative inference methods have not optimized various compute resources on edge devices. Thus, we propose DVFO, a novel DVFS-enabled edge-cloud collaborative inference framework, which co-optimizes DVFS and offloading parameters via deep reinforcement learning (DRL). Specifically, DVFO automatically co-optimizes 1) the CPU, GPU and memory frequencies of edge devices, and 2) the feature maps to be offloaded to cloud servers. In addition, it leverages a thinking-while-moving concurrent mechanism to accelerate the DRL learning process, and a spatial-channel attention mechanism to extract DNN feature maps of secondary importance for workload offloading. This approach improves inference performance for different DNN models under various edge-cloud network conditions. Extensive evaluations using two datasets and six widely-deployed DNN models on three heterogeneous edge devices show that DVFO significantly reduces the energy consumption by 33% on average, compared to state-of-the-art schemes. Moreover, DVFO achieves up to 28.6%-59.1% end-to-end latency reduction, while maintaining accuracy within 1% loss on average.
翻译:由于边缘设备资源受限以及深度神经网络(DNN)模型的不同特性,在边缘设备上优化DNN推理的能耗和端到端延迟性能是一项重大挑战。除动态电压频率调整(DVFS)技术外,边缘-云架构为高效DNN推理提供了一种协同方法。然而,当前边缘-云协同推理方法尚未对边缘设备的各类计算资源进行优化。为此,我们提出了DVFO——一种新型的基于DVFS的协同推理框架,通过深度强化学习(DRL)协同优化DVFS与卸载参数。具体而言,DVFO自动协同优化以下内容:1)边缘设备的CPU、GPU和内存频率;2)需卸载至云端服务器的特征图。此外,该框架利用"边思考边移动"并发机制加速DRL学习过程,并采用空间-通道注意力机制提取用于工作负载卸载的次要重要性DNN特征图。该方法在不同边缘-云网络条件下针对不同DNN模型提升了推理性能。在三个异构边缘设备上使用两个数据集和六种广泛部署的DNN模型进行的全面评估表明,与现有最优方案相比,DVFO平均降低能耗33%。同时,DVFO实现端到端延迟降低达28.6%-59.1%,且平均精度损失控制在1%以内。