vTune: Verifiable Fine-Tuning for LLMs Through Backdooring

As fine-tuning large language models (LLMs) becomes increasingly prevalent, users often rely on third-party services with limited visibility into their fine-tuning processes. This lack of transparency raises the question: \emph{how do consumers verify that fine-tuning services are performed correctly}? For instance, a service provider could claim to fine-tune a model for each user, yet simply send all users back the same base model. To address this issue, we propose vTune, a simple method that uses a small number of \textit{backdoor} data points added to the training data to provide a statistical test for verifying that a provider fine-tuned a custom model on a particular user's dataset. Unlike existing works, vTune is able to scale to verification of fine-tuning on state-of-the-art LLMs, and can be used both with open-source and closed-source models. We test our approach across several model families and sizes as well as across multiple instruction-tuning datasets, and find that the statistical test is satisfied with p-values on the order of $\sim 10^{-40}$, with no negative impact on downstream task performance. Further, we explore several attacks that attempt to subvert vTune and demonstrate the method's robustness to these attacks.

翻译：随着大型语言模型（LLMs）微调日益普及，用户通常依赖第三方服务，却对其微调过程缺乏可见性。这种透明度的缺失引发了一个问题：\emph{消费者如何验证微调服务是否被正确执行？}例如，服务提供商可能声称对每个用户的模型进行微调，却仅向所有用户返回相同的基座模型。为解决这一问题，我们提出vTune，一种通过在训练数据中添加少量\textit{后门}数据点的方法，为验证服务商是否基于特定用户数据集进行了定制化模型微调提供统计检验。与现有工作不同，vTune能够扩展至对前沿大型语言模型微调的验证，并适用于开源和闭源模型。我们在多种模型系列、规模以及多个指令微调数据集上测试该方法，发现统计检验的p值可达$\sim 10^{-40}$量级，且对下游任务性能无负面影响。此外，我们探究了多种试图破坏vTune的攻击方式，并证明了该方法对这些攻击的鲁棒性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/