Pre-trained Large Language Models (LLMs) are an integral part of modern AI that have led to breakthrough performances in complex AI tasks. Major AI companies with expensive infrastructures are able to develop and train these large models with billions and millions of parameters from scratch. Third parties, researchers, and practitioners are increasingly adopting these pre-trained models and fine-tuning them on their private data to accomplish their downstream AI tasks. However, it has been shown that an adversary can extract/reconstruct the exact training samples from these LLMs, which can lead to revealing personally identifiable information. The issue has raised deep concerns about the privacy of LLMs. Differential privacy (DP) provides a rigorous framework that allows adding noise in the process of training or fine-tuning LLMs such that extracting the training data becomes infeasible (i.e., with a cryptographically small success probability). While the theoretical privacy guarantees offered in most extant studies assume learning models from scratch through many training iterations in an asymptotic setting, this assumption does not hold in fine-tuning scenarios in which the number of training iterations is significantly smaller. To address the gap, we present \ewtune, a DP framework for fine-tuning LLMs based on Edgeworth accountant with finite-sample privacy guarantees. Our results across four well-established natural language understanding (NLU) tasks show that while \ewtune~adds privacy guarantees to LLM fine-tuning process, it directly contributes to decreasing the induced noise to up to 5.6\% and improves the state-of-the-art LLMs performance by up to 1.1\% across all NLU tasks. We have open-sourced our implementations for wide adoption and public testing purposes.
翻译:预训练大语言模型(LLMs)是现代人工智能的核心组成部分,在复杂AI任务中实现了突破性性能。拥有昂贵基础设施的大型AI公司能够从头开发并训练这些包含数十亿乃至数百万参数的大规模模型。第三方研究者与实践者正越来越多地采用这些预训练模型,基于私有数据进行微调以完成下游AI任务。然而,研究表明攻击者可以从这些LLMs中提取/重构精确的训练样本,从而泄露个人身份信息。该问题引发了关于LLMs隐私性的深切担忧。差分隐私(DP)提供了一个严谨的框架,通过在训练或微调LLMs过程中添加噪声,使得提取训练数据变得不可行(即具有密码学级别的小成功概率)。虽然现存多数研究提供的理论隐私保证假设在渐进场景下通过大量训练迭代从头学习模型,但该假设在训练迭代次数显著减少的微调场景中并不成立。为填补这一空白,我们提出\ewtune——一种基于Edgeworth计数器的DP微调框架,可提供有限样本隐私保证。在四个成熟自然语言理解(NLU)任务上的实验结果表明,\ewtune在保障LLM微调过程隐私性的同时,能够直接降低高达5.6%的诱导噪声,并在所有NLU任务上将现有最优LLM性能提升最多1.1%。我们已开源实现代码以供广泛采用和公开测试。