Large Language Models (LLMs) perform well on many NLP tasks, but fine-tuning them on resource-constrained mobile devices is challenging due to high memory and computation costs, despite growing demands for privacy-preserving personalization. Federated Learning (FL) enables local-data training, yet existing methods either rely on memory-intensive backpropagation or use zeroth-order optimization (ZOO), which avoids backward passes but suffers from slow convergence and degraded accuracy. We propose CooperLLM, a cloud-assisted edge-end cooperative federated fine-tuning framework that combines ZOO on mobile devices with cloud-guided gradient rectification. Mobile clients perform lightweight ZOO updates on private data, while the cloud fine-tunes on auxiliary public data using backpropagation and injects guided perturbations to rectify local updates, improving convergence and accuracy without violating privacy. To address system bottlenecks, CooperLLM introduces pipeline scheduling and adaptive compression to overlap computation and communication and reduce memory usage. Experiments on multiple Transformer models and datasets show that CooperLLM reduces on-device memory by up to $86.4\%$, accelerates convergence by $8.8 \times$, and improves accuracy by up to 10 percentage points over state-of-the-art ZOO-based baselines.
翻译:大型语言模型(LLMs)在许多自然语言处理任务中表现优异,然而尽管隐私保护个性化需求日益增长,在资源受限的移动设备上对其进行微调仍面临内存与计算成本高昂的挑战。联邦学习(FL)支持本地数据训练,但现有方法要么依赖内存密集的反向传播,要么采用零阶优化(ZOO)——后者虽避免反向传播却存在收敛缓慢与精度下降的问题。本文提出CooperLLM,一种云辅助的边-端协同联邦微调框架,将移动设备上的ZOO更新与云端引导的梯度校正相结合。移动客户端在私有数据上执行轻量级ZOO更新,云端则利用反向传播在辅助公共数据上进行微调,并通过注入引导扰动来校正本地更新,从而在不违反隐私的前提下提升收敛速度与精度。为应对系统瓶颈,CooperLLM引入流水线调度与自适应压缩技术,以重叠计算与通信并降低内存占用。在多个Transformer模型与数据集上的实验表明,相较于最先进的基于ZOO的基线方法,CooperLLM可降低设备内存达$86.4\%$,加速收敛$8.8$倍,并将精度提升最高10个百分点。