Trained on massive publicly available data, large language models (LLMs) have demonstrated tremendous success across various fields. While more data contributes to better performance, a disconcerting reality is that high-quality public data will be exhausted in a few years. In this paper, we offer a potential next step for contemporary LLMs: collaborative and privacy-preserving LLM training on the underutilized distributed private data via federated learning (FL), where multiple data owners collaboratively train a shared model without transmitting raw data. To achieve this, we build a concise, integrated, and research-friendly framework/codebase, named OpenFedLLM. It covers federated instruction tuning for enhancing instruction-following capability, federated value alignment for aligning with human values, and 7 representative FL algorithms. Besides, OpenFedLLM supports training on diverse domains, where we cover 8 training datasets; and provides comprehensive evaluations, where we cover 30+ evaluation metrics. Through extensive experiments, we observe that all FL algorithms outperform local training on training LLMs, demonstrating a clear performance improvement across a variety of settings. Notably, in a financial benchmark, Llama2-7B fine-tuned by applying any FL algorithm can outperform GPT-4 by a significant margin while the model obtained through individual training cannot, demonstrating strong motivation for clients to participate in FL. The code is available at https://github.com/rui-ye/OpenFedLLM.
翻译:在公开海量数据上训练的大型语言模型(LLMs)已在各领域展现巨大成功。尽管更多数据有助于提升性能,但令人担忧的是,高质量公共数据将在数年内耗尽。本文为当代LLM提出潜在发展方向:通过联邦学习(FL)利用未被充分使用的分布式私有数据进行协作且隐私保护的LLM训练——即多个数据拥有者在不传输原始数据的情况下协同训练共享模型。为此,我们构建了一个简洁、集成且便于研究的框架/代码库OpenFedLLM,涵盖增强指令遵循能力的联邦指令微调、对齐人类价值观的联邦价值对齐,以及7种典型联邦学习算法。此外,OpenFedLLM支持多领域训练(覆盖8个训练数据集)并提供综合评估体系(涵盖30余种评估指标)。通过大量实验,我们发现所有联邦学习算法在LLM训练中均优于本地训练,在不同设置下均展现出显著性能提升。特别值得注意的是,在金融基准测试中,应用任意联邦学习算法微调的Llama2-7B模型均能以显著优势超越GPT-4,而单独训练得到的模型则无法实现,这为客户端参与联邦学习提供了强有力的动机。代码已开源:https://github.com/rui-ye/OpenFedLLM。