Large language models are trained on vast amounts of internet data, prompting concerns and speculation that they have memorized public benchmarks. Going from speculation to proof of contamination is challenging, as the pretraining data used by proprietary models are often not publicly accessible. We show that it is possible to provide provable guarantees of test set contamination in language models without access to pretraining data or model weights. Our approach leverages the fact that when there is no data contamination, all orderings of an exchangeable benchmark should be equally likely. In contrast, the tendency for language models to memorize example order means that a contaminated language model will find certain canonical orderings to be much more likely than others. Our test flags potential contamination whenever the likelihood of a canonically ordered benchmark dataset is significantly higher than the likelihood after shuffling the examples. We demonstrate that our procedure is sensitive enough to reliably prove test set contamination in challenging situations, including models as small as 1.4 billion parameters, on small test sets of only 1000 examples, and datasets that appear only a few times in the pretraining corpus. Using our test, we audit five popular publicly accessible language models for test set contamination and find little evidence for pervasive contamination.
翻译:大型语言模型在海量互联网数据上训练,引发了关于它们可能记忆公开基准测试的担忧和猜测。将猜测转变为污染证据颇具挑战性,因为专有模型使用的预训练数据通常不公开。我们证明,在不访问预训练数据或模型权重的情况下,有可能提供语言模型测试集污染的可证明保证。我们的方法利用了以下事实:当不存在数据污染时,可交换基准测试的所有排序应具有同等可能性。相反,语言模型倾向于记忆示例顺序,这意味着受污染的语言模型会发现某些规范排序的可能性远高于其他排序。当规范排序的基准数据集的可能性显著高于随机打乱示例后的可能性时,我们的测试会标记潜在的污染。我们证明,该程序足够敏感,能够在具有挑战性的情况下可靠地证明测试集污染,包括参数规模小至14亿的模型、仅包含1000个示例的小型测试集,以及在预训练语料库中仅出现几次的数据集。利用我们的测试,我们对五种流行的公开可访问语言模型进行了测试集污染审计,发现几乎没有普遍污染的证据。