As Large Language Models (LLMs) are pretrained on massive-scale corpora, the issue of data contamination has become increasingly severe, leading to potential overestimation of model performance during evaluation. To address this, we propose AdEval (Alignment-based Dynamic Evaluation), a dynamic data evaluation method aimed at mitigating the impact of data contamination on evaluation reliability. Experimental results on multiple datasets demonstrate that AdEval effectively reduces the impact of data contamination on evaluation outcomes, enhancing both the fairness and reliability of the evaluation process.
翻译:随着大型语言模型(LLMs)在超大规模语料库上进行预训练,数据污染问题日益严重,导致评估过程中可能高估模型性能。为解决这一问题,我们提出了AdEval(基于对齐的动态评估),这是一种动态数据评估方法,旨在减轻数据污染对评估可靠性的影响。在多个数据集上的实验结果表明,AdEval能有效降低数据污染对评估结果的影响,提升了评估过程的公平性与可靠性。