Multilingual retrieval-augmented generation (MRAG) requires models to effectively acquire and integrate beneficial external knowledge from multilingual collections. However, most existing studies employ a unitive process where queries of equivalent semantics across different languages are processed through a single-turn retrieval and subsequent optimization. Such a ``one-size-fits-all'' strategy is often suboptimal in multilingual settings, as the models occur to knowledge bias and conflict during the interaction with the search engine. To alleviate the issues, we propose LcRL, a multilingual search-augmented reinforcement learning framework that integrates a language-coupled Group Relative Policy Optimization into the policy and reward models. We adopt the language-coupled group sampling in the rollout module to reduce knowledge bias, and regularize an auxiliary anti-consistency penalty in the reward models to mitigate the knowledge conflict. Experimental results demonstrate that LcRL not only achieves competitive performance but is also appropriate for various practical scenarios such as constrained training data and retrieval over collections encompassing a large number of languages. Our code is available at https://github.com/Cherry-qwq/LcRL-Open.
翻译:多语言检索增强生成(MRAG)要求模型能够有效地从多语言语料库中获取并整合有益的外部知识。然而,现有研究大多采用统一处理流程,即将不同语言中语义等价的查询通过单轮检索及后续优化进行处理。这种“一刀切”的策略在多语言场景中往往效果欠佳,因为模型在与搜索引擎交互过程中容易出现知识偏差和冲突。为缓解这些问题,我们提出了LcRL,一种多语言搜索增强强化学习框架,该框架将语言耦合的组相对策略优化集成到策略模型和奖励模型中。我们在rollout模块中采用语言耦合的组采样以减少知识偏差,并在奖励模型中引入辅助的反一致性惩罚正则化以缓解知识冲突。实验结果表明,LcRL不仅取得了具有竞争力的性能,而且适用于多种实际场景,例如训练数据受限以及在包含大量语言的语料库上进行检索。我们的代码公开于https://github.com/Cherry-qwq/LcRL-Open。