Edge AI has been recently proposed to facilitate the training and deployment of Deep Neural Network (DNN) models in proximity to the sources of data. To enable the training of large models on resource-constraint edge devices and protect data privacy, parallel split learning is becoming a practical and popular approach. However, current parallel split learning neglects the resource heterogeneity of edge devices, which may lead to the straggler issue. In this paper, we propose EdgeSplit, a novel parallel split learning framework to better accelerate distributed model training on heterogeneous and resource-constraint edge devices. EdgeSplit enhances the efficiency of model training on less powerful edge devices by adaptively segmenting the model into varying depths. Our approach focuses on reducing total training time by formulating and solving a task scheduling problem, which determines the most efficient model partition points and bandwidth allocation for each device. We employ a straightforward yet effective alternating algorithm for this purpose. Comprehensive tests conducted with a range of DNN models and datasets demonstrate that EdgeSplit not only facilitates the training of large models on resource-restricted edge devices but also surpasses existing baselines in performance.
翻译:近年来,边缘人工智能被提出以促进深度神经网络(DNN)模型在数据源附近进行训练和部署。为了在资源受限的边缘设备上训练大型模型并保护数据隐私,并行分裂学习正成为一种实用且流行的方法。然而,当前的并行分裂学习忽视了边缘设备的资源异构性,这可能导致落后节点问题。本文提出EdgeSplit,一种新型的并行分裂学习框架,旨在加速异构且资源受限的边缘设备上的分布式模型训练。EdgeSplit通过将模型自适应分割为不同深度,提升了在性能较弱的边缘设备上的模型训练效率。我们的方法通过制定并求解任务调度问题来减少总训练时间,该问题为每个设备确定最有效的模型分割点和带宽分配。我们采用一种简单而有效的交替算法来解决此问题。在多种DNN模型和数据集上进行的综合测试表明,EdgeSplit不仅促进了资源受限边缘设备上大型模型的训练,而且在性能上超越了现有基线方法。