Multilingual Retrieval-Augmented Generation (mRAG) systems often exhibit a perceived preference for high-resource languages, particularly English, resulting in the widespread adoption of English pivoting. While prior studies attribute this advantage to the superior English-centric capabilities of Large Language Models (LLMs), we find that such measurements are significantly distorted by structural priors inherent in evaluation benchmarks. Specifically, we identify exposure bias and a gold availability prior-both driven by the disproportionate concentration of resources in English-as well as cultural priors rooted in topic locality, as factors that hinder accurate assessment of genuine language preference. To address these biases, we propose DeLP (Debiased Language Preference), a calibrated metric designed to explicitly factor out these structural confounds. Our analysis using DeLP reveals that the previously reported English preference is largely a byproduct of evidence distribution rather than an inherent model bias. Instead, we find that retrievers fundamentally favor monolingual alignment between the query and the document language. Building on this insight, we introduce DELTA (DEbiased Language preference-guided Text Augmentation), a lightweight and efficient mRAG framework that strategically leverages monolingual alignment to optimize cross-lingual retrieval and generation. Experimental results demonstrate that DELTA consistently outperforms English pivoting and mRAG baselines across diverse languages.
翻译:多语言检索增强生成(mRAG)系统常表现出对高资源语言(尤其是英语)的感知偏好,导致英语枢纽策略被广泛采用。虽然先前研究将此优势归因于大语言模型(LLM)以英语为中心的能力优势,但我们发现此类评估因基准测试中固有的结构化先验而产生显著失真。具体而言,我们识别出暴露偏差和黄金可用性先验(两者均由资源在英语中过度集中所驱动),以及根植于主题局部性的文化先验,这些因素共同阻碍了对真实语言偏好的准确评估。为消除这些偏差,我们提出DeLP(去偏语言偏好),这是一种经过校准的度量标准,旨在显式排除这些结构化混杂因素。使用DeLP的分析表明,先前报道的英语偏好主要是证据分布的副产品,而非内在的模型偏差。相反,我们发现检索器本质上倾向于查询与文档语言之间的单语对齐。基于这一洞见,我们提出DELTA(基于去偏语言偏好的文本增强框架),这是一个轻量高效的多语言RAG框架,通过策略性利用单语对齐来优化跨语言检索与生成。实验结果表明,DELTA在多种语言中持续优于英语枢纽策略及多语言RAG基线模型。