LLM-as-a-Judge has become a dominant approach in automated evaluation systems, playing critical roles in model alignment, leaderboard construction, quality control, and so on. However, the scalability and trustworthiness of this approach can be substantially distorted by Self-Preference Bias (SPB), which is a directional evaluative deviation in which LLMs systematically favor or disfavor their own generated outputs during evaluation. Existing measurements rely on costly human annotations and conflate generative capability with evaluative stance, and thus are impractical for large-scale deployment in real-world systems. To address this issue, we introduce a fully automated framework to quantifying and mitigating SPB, which constructs equal-quality pairs of responses with negligible quality differences, enabling statistical disentanglement of discriminability from bias propensity without human gold standards. Empirical analysis across 20 mainstream LLMs reveals that advanced capabilities are often uncorrelated, or even negatively correlated, with low SPB. To mitigate this bias, we propose a structured multi-dimensional evaluation strategy grounded in cognitive load decomposition, which reduces SPB by 31.5\% on average.
翻译:大语言模型作为评判者(LLM-as-a-Judge)已成为自动化评估系统中的主流方法,在模型对齐、排行榜构建、质量控制等环节发挥着关键作用。然而,该方法可扩展性和可信度可能因自我偏好偏差(Self-Preference Bias, SPB)而严重失真——这是一种方向性评估偏差,即大语言模型在评估中对自身生成内容系统性地表现出偏向或排斥。现有测量方法依赖高成本的人工标注,且将生成能力与评估立场混为一谈,难以在真实系统中大规模部署。为此,我们提出一套全自动框架来量化与缓解SPB。该框架可构建质量差异可忽略的等质应答对,无需人工金标准即可从偏差倾向中统计分离出判别能力。对20种主流大语言模型的实证分析表明,先进能力与低SPB往往不相关,甚至呈负相关。为缓解该偏差,我们提出一种基于认知负荷分解的结构化多维评估策略,平均降低SPB达31.5%。