语言模型的独立性检验 (Independence Tests for Language Models)

We consider the following problem: given the weights of two models, can we test whether they were trained independently -- i.e., from independent random initializations? We consider two settings: constrained and unconstrained. In the constrained setting, we make assumptions about model architecture and training and propose a family of statistical tests that yield exact p-values with respect to the null hypothesis that the models are trained from independent random initializations. These p-values are valid regardless of the composition of either model's training data; we compute them by simulating exchangeable copies of each model under our assumptions and comparing various similarity measures of weights and activations between the original two models versus these copies. We report the p-values from these tests on pairs of 21 open-weight models (210 total pairs) and correctly identify all pairs of non-independent models. Our tests remain effective even if one model was fine-tuned for many tokens. In the unconstrained setting, where we make no assumptions about training procedures, can change model architecture, and allow for adversarial evasion attacks, the previous tests no longer work. Instead, we propose a new test which matches hidden activations between two models, and which is robust to adversarial transformations and to changes in model architecture. The test can also do localized testing: identifying specific non-independent components of models. Though we no longer obtain exact p-values from this, empirically we find it behaves as one and reliably identifies non-independent models. Notably, we can use the test to identify specific parts of one model that are derived from another (e.g., how Llama 3.1-8B was pruned to initialize Llama 3.2-3B, or shared layers between Mistral-7B and StripedHyena-7B), and it is even robust to retraining individual layers of either model from scratch.

翻译：我们考虑以下问题：给定两个模型的权重，能否检验它们是否独立训练——即是否源于独立的随机初始化？我们考虑两种设定：约束设定与非约束设定。在约束设定中，我们对模型架构与训练过程做出假设，并提出一系列统计检验方法，这些方法能针对"模型源于独立随机初始化"的零假设给出精确的p值。这些p值的有效性不受任一模型训练数据构成的影响；我们通过模拟符合假设条件下每个模型的可交换副本，并比较原始两个模型与这些副本之间在权重和激活值上的各种相似性度量来计算p值。我们在21个开源权重模型的所有配对（共210对）上报告了这些检验的p值，并正确识别了所有非独立模型对。即使某个模型经过大量令牌的微调，我们的检验方法依然有效。在非约束设定中，我们对训练过程不作任何假设，允许改变模型架构，并考虑对抗性规避攻击，前述检验方法不再适用。为此，我们提出一种新的检验方法，该方法通过匹配两个模型的隐藏激活值来实现，并对对抗性变换和模型架构变更具有鲁棒性。该检验还能进行局部化测试：识别模型中特定的非独立组件。虽然我们无法从此方法获得精确的p值，但实证表明其表现符合p值特性，并能可靠识别非独立模型。值得注意的是，我们可以利用该检验识别一个模型中源自另一模型的具体部分（例如Llama 3.1-8B如何通过剪枝初始化Llama 3.2-3B，或Mistral-7B与StripedHyena-7B之间的共享层），即使对任一模型的单个层进行重新训练，该方法仍保持鲁棒性。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日