Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods in model development. While their combination significantly enhances the efficiency of model training and evaluation, little attention has been given to the potential contamination brought by this new model development paradigm. In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators. To study this issue, we first define three common relatednesses between the data generator LLM and the judge LLM: being the same model, having an inheritance relationship, and belonging to the same model family. Through extensive experiments, we empirically confirm the bias of judges towards their related student models caused by preference leakage across multiple LLM baselines and benchmarks. Further analysis suggests that preference leakage is a pervasive and real-world problem that is harder to detect compared to previously identified biases in LLM-as-a-judge scenarios. All of these findings imply that preference leakage is a widespread and challenging problem in the area of LLM-as-a-judge. We release all codes and data at: https://github.com/David-Li0406/Preference-Leakage.
翻译:大型语言模型(LLM)作为评判者以及基于LLM的数据合成,已成为模型开发中两种基础的LLM驱动数据标注方法。尽管它们的结合显著提升了模型训练与评估的效率,但这一新型模型开发范式可能带来的污染问题却鲜有关注。在本工作中,我们揭示了偏好泄露——一种由合成数据生成器与基于LLM的评估器之间的关联性所引发的、在LLM作为评判者场景下的污染问题。为研究此问题,我们首先定义了数据生成器LLM与评判器LLM之间三种常见的关联性:同一模型、存在继承关系以及属于同一模型家族。通过大量实验,我们在多个LLM基线和基准测试中,实证验证了因偏好泄露导致的评判者对其关联学生模型的偏好偏差。进一步分析表明,与先前发现的LLM作为评判者场景中的偏差相比,偏好泄露是一个更难以检测的普遍性现实问题。所有这些发现均表明,偏好泄露是LLM作为评判者领域中一个广泛存在且具有挑战性的问题。我们已发布全部代码与数据于:https://github.com/David-Li0406/Preference-Leakage。