In recent years, Large Language Models (LLMs) through Transformer structures have dominated many machine learning tasks, especially text processing. However, these models require massive amounts of data for training and induce high resource requirements, particularly in terms of the large number of Floating Point Operations (FLOPs) and the high amounts of memory needed. To fine-tune such a model in a parameter-efficient way, techniques like Adapter or LoRA have been developed. However, we observe that the application of LoRA, when used in federated learning (FL), while still being parameter-efficient, is memory and FLOP inefficient. Based on that observation, we develop a novel layer finetuning scheme that allows devices in cross-device FL to make use of pretrained neural networks (NNs) while adhering to given resource constraints. We show that our presented scheme outperforms the current state of the art when dealing with homogeneous or heterogeneous computation and memory constraints and is on par with LoRA regarding limited communication, thereby achieving significantly higher accuracies in FL training.
翻译:近年来,基于Transformer架构的大语言模型(LLMs)已在众多机器学习任务中占据主导地位,尤其在文本处理领域表现突出。然而,这类模型需要海量数据进行训练,并引发高昂的资源需求,具体体现在庞大的浮点运算量(FLOPs)和高内存占用。为以参数高效的方式微调此类模型,学界已开发出如Adapter或LoRA等技术。但我们发现,当LoRA应用于联邦学习(FL)场景时,虽仍保持参数高效性,却在内存和浮点运算效率方面表现不足。基于此观察,我们提出一种新颖的层级微调方案,使跨设备联邦学习中的终端设备能够在遵循给定资源约束的前提下,有效利用预训练神经网络(NNs)。实验表明,在处理同构或异构的计算与内存约束时,我们所提出的方案优于当前最优方法;在有限通信条件下,其性能与LoRA相当,从而在联邦学习训练中实现了显著更高的准确率。