黑盒模型溯源：基于重写记忆的成员推断 (Blackbox Model Provenance via Palimpsestic Membership Inference)

Suppose Alice trains an open-weight language model and Bob uses a blackbox derivative of Alice's model to produce text. Can Alice prove that Bob is using her model, either by querying Bob's derivative model (query setting) or from the text alone (observational setting)? We formulate this question as an independence testing problem--in which the null hypothesis is that Bob's model or text is independent of Alice's randomized training run--and investigate it through the lens of palimpsestic memorization in language models: models are more likely to memorize data seen later in training, so we can test whether Bob is using Alice's model using test statistics that capture correlation between Bob's model or text and the ordering of training examples in Alice's training run. If Alice has randomly shuffled her training data, then any significant correlation amounts to exactly quantifiable statistical evidence against the null hypothesis, regardless of the composition of Alice's training data. In the query setting, we directly estimate (via prompting) the likelihood Bob's model gives to Alice's training examples and order; we correlate the likelihoods of over 40 fine-tunes of various Pythia and OLMo base models ranging from 1B to 12B parameters with the base model's training data order, achieving a p-value on the order of at most 1e-8 in all but six cases. In the observational setting, we try two approaches based on estimating 1) the likelihood of Bob's text overlapping with spans of Alice's training examples and 2) the likelihood of Bob's text with respect to different versions of Alice's model we obtain by repeating the last phase (e.g., 1%) of her training run on reshuffled data. The second approach can reliably distinguish Bob's text from as little as a few hundred tokens; the first does not involve any retraining but requires many more tokens (several hundred thousand) to achieve high power.

翻译：假设Alice训练了一个公开权重的语言模型，而Bob使用Alice模型的黑盒衍生版本来生成文本。Alice能否证明Bob正在使用她的模型？这可以通过查询Bob的衍生模型（查询设定）或仅从文本本身（观测设定）来实现。我们将此问题形式化为一个独立性检验问题——其中零假设为Bob的模型或文本与Alice的随机化训练过程相互独立——并通过语言模型中重写记忆的视角进行研究：模型更可能记忆训练后期出现的数据，因此我们可以通过捕捉Bob的模型/文本与Alice训练样本顺序之间相关性的检验统计量，来验证Bob是否使用了Alice的模型。若Alice已对其训练数据进行随机打乱，则任何显著相关性都将构成可精确量化的、反对零假设的统计证据，且该结论与Alice训练数据的具体构成无关。在查询设定中，我们直接通过提示估计Bob模型对Alice训练样本及其顺序的似然值；我们将超过40个基于Pythia和OLMo（参数量1B至12B）的微调模型的似然值与基模型训练数据顺序进行相关性分析，除六种情况外均获得p值不超过1e-8量级的结果。在观测设定中，我们尝试了两种方法：1）估计Bob文本与Alice训练样本片段重叠的似然值；2）估计Bob文本相对于Alice模型不同版本的似然值，这些版本通过对其训练最后阶段（如1%）使用重排数据重复训练获得。第二种方法仅需数百个标记即可可靠区分Bob的文本；第一种方法无需任何重新训练，但需要数十万个标记才能达到较高的检验功效。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日