Privately Fine-Tuning Large Language Models with Differential Privacy

Pre-trained Large Language Models (LLMs) are an integral part of modern AI that have led to breakthrough performances in complex AI tasks. Major AI companies with expensive infrastructures are able to develop and train these large models with billions and millions of parameters from scratch. Third parties, researchers, and practitioners are increasingly adopting these pre-trained models and fine-tuning them on their private data to accomplish their downstream AI tasks. However, it has been shown that an adversary can extract/reconstruct the exact training samples from these LLMs, which can lead to revealing personally identifiable information. The issue has raised deep concerns about the privacy of LLMs. Differential privacy (DP) provides a rigorous framework that allows adding noise in the process of training or fine-tuning LLMs such that extracting the training data becomes infeasible (i.e., with a cryptographically small success probability). While the theoretical privacy guarantees offered in most extant studies assume learning models from scratch through many training iterations in an asymptotic setting, this assumption does not hold in fine-tuning scenarios in which the number of training iterations is significantly smaller. To address the gap, we present \ewtune, a DP framework for fine-tuning LLMs based on Edgeworth accountant with finite-sample privacy guarantees. Our results across four well-established natural language understanding (NLU) tasks show that while \ewtune~adds privacy guarantees to LLM fine-tuning process, it directly contributes to decreasing the induced noise to up to 5.6\% and improves the state-of-the-art LLMs performance by up to 1.1\% across all NLU tasks. We have open-sourced our implementations for wide adoption and public testing purposes.

翻译：预训练的大型语言模型（LLMs）是现代人工智能不可或缺的组成部分，在复杂人工智能任务中取得了突破性性能。拥有昂贵基础设施的主要人工智能公司能够从零开始开发并训练这些包含数十亿乃至数百万参数的大型模型。第三方、研究人员及从业者越来越多地采用这些预训练模型，并在其私有数据上进行微调以完成下游人工智能任务。然而，研究表明，攻击者可以从这些LLMs中提取/重建准确的训练样本，这可能导致个人身份信息的泄露。该问题已引发对LLMs隐私性的深切担忧。差分隐私（DP）提供了一个严格框架，允许在LLMs训练或微调过程中添加噪声，使得提取训练数据变得不可行（即成功的概率在密码学上极小）。尽管现有研究中的理论隐私保证大多假设从零开始通过大量训练迭代（在渐近设定下）学习模型，但这一假设在训练迭代次数显著较少的微调场景中并不成立。为填补这一空白，我们提出了\ewtune——一个基于Edgeworth会计方法、具有有限样本隐私保证的LLMs微调DP框架。我们在四个公认的自然语言理解（NLU）任务上的结果表明，尽管\ewtune为LLMs微调过程增加了隐私保证，但它直接将诱导噪声降低了高达5.6%，并在所有NLU任务上将最先进的LLMs性能提升了最多1.1%。我们已开源实现，便于广泛采用和公开测试。