Multilingual Retrieval-Augmented Generation (mRAG) systems often exhibit a perceived preference for high-resource languages, particularly English, resulting in the widespread adoption of English pivoting. While prior studies attribute this advantage to the superior English-centric capabilities of Large Language Models (LLMs), we find that such measurements are significantly distorted by structural priors inherent in evaluation benchmarks. Specifically, we identify exposure bias and a gold availability prior-both driven by the disproportionate concentration of resources in English-as well as cultural priors rooted in topic locality, as factors that hinder accurate assessment of genuine language preference. To address these biases, we propose DeLP (Debiased Language Preference), a calibrated metric designed to explicitly factor out these structural confounds. Our analysis using DeLP reveals that the previously reported English preference is largely a byproduct of evidence distribution rather than an inherent model bias. Instead, we find that retrievers fundamentally favor monolingual alignment between the query and the document language. Building on this insight, we introduce DELTA (DEbiased Language preference-guided Text Augmentation), a lightweight and efficient mRAG framework that strategically leverages monolingual alignment to optimize cross-lingual retrieval and generation. Experimental results demonstrate that DELTA consistently outperforms English pivoting and mRAG baselines across diverse languages.
翻译:多语言检索增强生成(mRAG)系统通常表现出对高资源语言(尤其是英语)的明显偏好,导致英语中转策略被广泛采用。虽然先前研究将这一优势归因于大语言模型(LLMs)优越的英语中心能力,但我们发现此类测量结果被评估基准中固有的结构性先验显著扭曲。具体而言,我们识别出以下阻碍真实语言偏好准确评估的因素:由英语资源过度集中导致的曝光偏差和黄金可用性先验,以及根植于主题局域性的文化先验。为解决这些偏差,我们提出DeLP(去偏语言偏好),一种校准度量方法,旨在明确分离这些结构性混淆因素。利用DeLP的分析表明,先前报告的英语偏好主要是证据分布的副产品,而非模型固有偏差。相反,我们发现检索器从根本上倾向于查询与文档语言间的单语对齐。基于这一发现,我们引入DELTA(去偏语言偏好引导文本增强),一种轻量级且高效的mRAG框架,通过战略性利用单语对齐优化跨语言检索与生成。实验结果表明,DELTA在多种语言上始终优于英语中转及mRAG基准方法。