Transformers, a cornerstone of deep-learning architectures for sequential data, have achieved state-of-the-art results in tasks like Natural Language Processing (NLP). Models such as BERT and GPT-3 exemplify their success and have driven the rise of large language models (LLMs). However, a critical challenge persists: safeguarding the privacy of data used in LLM training. Privacy-preserving techniques like Federated Learning (FL) offer potential solutions, but practical limitations hinder their effectiveness for Transformer training. Two primary issues are (I) the risk of sensitive information leakage due to aggregation methods like FedAvg or FedSGD, and (II) the high communication overhead caused by the large size of Transformer models. This paper introduces a novel FL method that reduces communication overhead while maintaining competitive utility. Our approach avoids sharing full model weights by simulating a global model locally. We apply k-means clustering to each Transformer layer, compute centroids locally, and transmit only these centroids to the server instead of full weights or gradients. To enhance security, we leverage Intel SGX for secure transmission of centroids. Evaluated on a translation task, our method achieves utility comparable to state-of-the-art baselines while significantly reducing communication costs. This provides a more efficient and privacy-preserving FL solution for Transformer models.
翻译:Transformer作为处理序列数据的深度学习架构基石,已在自然语言处理(NLP)等任务中取得了最先进的成果。BERT和GPT-3等模型的成功实践推动了大型语言模型(LLMs)的兴起。然而,一个关键挑战依然存在:如何保护LLM训练所用数据的隐私。联邦学习(FL)等隐私保护技术提供了潜在的解决方案,但其实际局限性阻碍了它们在Transformer训练中的有效性。两个主要问题是:(I)由于FedAvg或FedSGD等聚合方法导致的敏感信息泄露风险;(II)Transformer模型规模庞大所引起的高通信开销。本文提出了一种新颖的FL方法,在保持竞争力性能的同时降低了通信开销。我们的方法通过本地模拟全局模型,避免了共享完整的模型权重。我们对每个Transformer层应用k-means聚类,在本地计算质心,并仅将这些质心而非完整的权重或梯度传输至服务器。为增强安全性,我们利用Intel SGX实现质心的安全传输。在翻译任务上的评估表明,我们的方法在显著降低通信成本的同时,达到了与最先进基线相当的性能。这为Transformer模型提供了一种更高效且保护隐私的FL解决方案。