In the evolution towards 6G, integrating Artificial Intelligence (AI) with advanced network infrastructure emerges as a pivotal strategy for enhancing network intelligence and resource utilization. Existing distributed learning frameworks like Federated Learning and Split Learning often struggle with significant challenges in dynamic network environments including high synchronization demands, costly communication overheads, severe computing resource consumption, and data heterogeneity across network nodes. These obstacles hinder the applications of ubiquitous computing capabilities of 6G networks, especially in light of the trend of escalating model parameters and training data volumes. To address these challenges effectively, this paper introduces "Snake Learning", a cost-effective distributed learning framework. Specifically, Snake Learning respects the heterogeneity of inter-node computing capability and local data distribution in 6G networks, and sequentially trains the designated part of model layers on individual nodes. This layer-by-layer serpentine update mechanism contributes to significantly reducing the requirements for storage, memory and communication during the model training phase, and demonstrates superior adaptability and efficiency for both Computer Vision (CV) training and Large Language Model (LLM) fine-tuning tasks across homogeneous and heterogeneous data distributions.
翻译:在向6G演进的过程中,将人工智能(AI)与先进网络基础设施相融合,成为提升网络智能化和资源利用率的关键策略。现有的分布式学习框架,如联邦学习和分裂学习,在动态网络环境中常面临显著挑战,包括高同步需求、高昂的通信开销、严重的计算资源消耗以及网络节点间的数据异质性。这些障碍阻碍了6G网络泛在计算能力的应用,尤其是在模型参数和训练数据量持续增长的趋势下。为有效应对这些挑战,本文提出了一种经济高效的分布式学习框架“Snake Learning”。具体而言,Snake Learning尊重6G网络中节点间计算能力和本地数据分布的异质性,并在各节点上顺序训练模型层的指定部分。这种逐层蛇形更新机制显著降低了模型训练阶段对存储、内存和通信的需求,并在同质和异质数据分布下,对计算机视觉(CV)训练和大语言模型(LLM)微调任务均展现出卓越的适应性和效率。