Deep learning models are increasingly deployed to edge devices for real-time applications. To ensure stable service quality across diverse edge environments, it is highly desirable to generate tailored model architectures for different conditions. However, conventional pre-deployment model generation approaches are not satisfactory due to the difficulty of handling the diversity of edge environments and the demand for edge information. In this paper, we propose to adapt the model architecture after deployment in the target environment, where the model quality can be precisely measured and private edge data can be retained. To achieve efficient and effective edge model generation, we introduce a pretraining-assisted on-cloud model elastification method and an edge-friendly on-device architecture search method. Model elastification generates a high-quality search space of model architectures with the guidance of a developer-specified oracle model. Each subnet in the space is a valid model with different environment affinity, and each device efficiently finds and maintains the most suitable subnet based on a series of edge-tailored optimizations. Extensive experiments on various edge devices demonstrate that our approach is able to achieve significantly better accuracy-latency tradeoffs (e.g. 46.74\% higher on average accuracy with a 60\% latency budget) than strong baselines with minimal overhead (13 GPU hours in the cloud and 2 minutes on the edge server).
翻译:深度学习模型日益被部署到边缘设备上用于实时应用。为保证在多样边缘环境中稳定的服务质量,亟需为不同条件生成定制化的模型架构。然而,传统部署前模型生成方法因难以处理边缘环境的多样性及对边缘信息的需求而效果不佳。本文提出在目标环境部署后对模型架构进行适配,此时可精确衡量模型质量并保留私有边缘数据。为实现高效且有效的边缘模型生成,我们引入一种预训练辅助的云端模型弹性化方法及一种边缘友好的设备端架构搜索方法。模型弹性化基于开发者指定的oracle模型生成高质量模型架构搜索空间,其中每个子网均为具有不同环境适应性的有效模型。每台设备通过一系列边缘定制优化高效寻找并维护最适子网。在多种边缘设备上的广泛实验表明,与强基线方法相比,我们的方法能以极低开销(云端13GPU小时、边缘服务器2分钟)实现显著更优的精度-延迟权衡(例如在60%延迟预算下平均精度提升46.74%)。