On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models

The pretraining and fine-tuning approach has become the leading technique for various NLP applications. However, recent studies reveal that fine-tuning data, due to their sensitive nature, domain-specific characteristics, and identifiability, pose significant privacy concerns. To help develop more privacy-resilient fine-tuning models, we introduce a novel active privacy auditing framework, dubbed Parsing, designed to identify and quantify privacy leakage risks during the supervised fine-tuning (SFT) of language models (LMs). The framework leverages improved white-box membership inference attacks (MIAs) as the core technology, utilizing novel learning objectives and a two-stage pipeline to monitor the privacy of the LMs' fine-tuning process, maximizing the exposure of privacy risks. Additionally, we have improved the effectiveness of MIAs on large LMs including GPT-2, Llama2, and certain variants of them. Our research aims to provide the SFT community of LMs with a reliable, ready-to-use privacy auditing tool, and to offer valuable insights into safeguarding privacy during the fine-tuning process. Experimental results confirm the framework's efficiency across various models and tasks, emphasizing notable privacy concerns in the fine-tuning process. Project code available for https://github.com/mapleleavesss/PARSING.

翻译：预训练与微调方法已成为各类自然语言处理应用的主流技术。然而，近期研究表明，由于微调数据的敏感性、领域特定性及可识别性，其引发了严重的隐私担忧。为助力开发更具隐私韧性的微调模型，本文提出一种名为Parsing的新型主动隐私审计框架，旨在识别和量化语言模型在监督微调过程中的隐私泄露风险。该框架以改进的白盒成员推理攻击为核心技术，通过新颖的学习目标与两阶段流程监控语言模型微调过程的隐私状态，最大化暴露隐私风险。此外，我们提升了包括GPT-2、Llama2及其部分变体在内的大规模语言模型的成员推理攻击效能。本研究旨在为语言模型监督微调社区提供可靠、即用的隐私审计工具，并为微调过程中的隐私保护提供重要见解。实验结果证实了该框架在不同模型与任务中的高效性，凸显了微调过程中值得关注的隐私问题。项目代码发布于https://github.com/mapleleavesss/PARSING。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/