As pre-trained language models grow in size, full fine-tuning their parameters on task adaptation data becomes increasingly impractical. To address this challenge, some methods for low-rank adaptation of language models have been proposed, e.g. LoRA, which incorporates trainable low-rank decomposition matrices into only some parameters of the pre-trained model, called adapters. This approach significantly reduces the number of trainable parameters compared to fine-tuning all parameters or adapters. In this work, we look at low-rank adaptation method from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA is equivalent to fine-tuning adapters with noisy batch gradients - just like what DPSGD algorithm does. We also quantify the variance of the injected noise as a decreasing function of adaptation rank. By establishing a Berry-Esseen type bound on the total variation distance between the injected noise distribution and a Gaussian noise distribution with the same variance, we show that the dynamics of low-rank adaptation is very close to when DPSGD is performed w.r.t the adapters. Following our theoretical findings and approved by our experimental results, we show that low-rank adaptation provides robustness to membership inference attacks w.r.t the fine-tuning data.
翻译:随着预训练语言模型规模的扩大,在任务适配数据上完全微调其所有参数变得越来越不切实际。为应对这一挑战,研究者提出了一些语言模型的低秩适配方法,例如LoRA,该方法仅将可训练的低秩分解矩阵融入预训练模型的部分参数中,这些参数被称为适配器。与微调所有参数或适配器相比,该方法显著减少了可训练参数的数量。在本工作中,我们从数据隐私的视角审视低秩适配方法。我们从理论上证明,LoRA中使用的低秩适配等价于使用带噪声的批次梯度微调适配器——这与DPSGD算法所做的工作完全一致。我们还将注入噪声的方差量化为适配秩的递减函数。通过建立注入噪声分布与具有相同方差的高斯噪声分布之间全变差距离的Berry-Esseen型上界,我们证明低秩适配的动态过程与对适配器执行DPSGD时的情况极为接近。基于我们的理论发现并经实验结果证实,我们表明低秩适配能够提供针对微调数据的成员推理攻击的鲁棒性。