Natural language processing models have experienced a significant upsurge in recent years, with numerous applications being built upon them. Many of these applications require fine-tuning generic base models on customized, proprietary datasets. This fine-tuning data is especially likely to contain personal or sensitive information about individuals, resulting in increased privacy risk. Membership inference attacks are the most commonly employed attack to assess the privacy leakage of a machine learning model. However, limited research is available on the factors that affect the vulnerability of language models to this kind of attack, or on the applicability of different defense strategies in the language domain. We provide the first systematic review of the vulnerability of fine-tuned large language models to membership inference attacks, the various factors that come into play, and the effectiveness of different defense strategies. We find that some training methods provide significantly reduced privacy risk, with the combination of differential privacy and low-rank adaptors achieving the best privacy protection against these attacks.
翻译:近年来,自然语言处理模型经历了显著的增长,众多应用基于这些模型构建。其中许多应用需要在定制化的专有数据集上对通用基础模型进行微调。这类微调数据尤其可能包含与个人相关的敏感信息,从而导致隐私风险增加。成员推断攻击是评估机器学习模型隐私泄露最常用的攻击手段。然而,关于影响语言模型对此类攻击脆弱性的因素,以及不同防御策略在语言领域的适用性,相关研究仍然有限。我们首次系统综述了微调大型语言模型对成员推断攻击的脆弱性、涉及的各种影响因素以及不同防御策略的有效性。研究发现,某些训练方法能显著降低隐私风险,其中差分隐私与低秩适配器的结合可实现针对这些攻击的最佳隐私保护。