Large language models are increasingly customized through fine-tuning and other adaptations, creating challenges in enforcing licensing terms and managing downstream impacts. Tracking model origins is crucial both for protecting intellectual property and for identifying derived models when biases or vulnerabilities are discovered in foundation models. We address this challenge by developing a framework for testing model provenance: Whether one model is derived from another. Our approach is based on the key observation that real-world model derivations preserve significant similarities in model outputs that can be detected through statistical analysis. Using only black-box access to models, we employ multiple hypothesis testing to compare model similarities against a baseline established by unrelated models. On two comprehensive real-world benchmarks spanning models from 30M to 4B parameters and comprising over 600 models, our tester achieves 90-95% precision and 80-90% recall in identifying derived models. These results demonstrate the viability of systematic provenance verification in production environments even when only API access is available.
翻译:大语言模型通过微调及其他适应方法日益定制化,这对执行许可条款和管理下游影响带来了挑战。追踪模型来源对于保护知识产权以及在基础模型中发现偏见或漏洞时识别衍生模型至关重要。我们通过开发一个测试模型来源的框架来解决这一挑战:即判断一个模型是否衍生自另一个模型。我们的方法基于关键观察:现实中的模型衍生会在模型输出中保留显著的相似性,这些相似性可通过统计分析检测。仅通过模型的黑盒访问,我们采用多重假设检验将模型相似性与无关模型建立的基线进行比较。在两个涵盖参数量从3000万到40亿、包含超过600个模型的综合现实基准测试中,我们的测试器在识别衍生模型方面实现了90-95%的精确率和80-90%的召回率。这些结果表明,即使仅能通过API访问,在生产环境中进行系统性来源验证是可行的。