Multilingual retrieval-augmented generation (mRAG) is often implemented within a fixed retrieval space, typically via query or document translation or multilingual embedding vector representations. However, this approach may be inadequate for culturally grounded queries, in which retrieval-condition misalignment may occur. Even strong retrievers and generators may struggle to produce culturally relevant answers when sourcing evidence from inappropriate linguistic or regional contexts. To this end, we introduce CORAL (COntext-aware Retrieval with Agentic Loop, an adaptive retrieval methodology for mRAG that enables iterative refinement of both the retrieval space (corpora) and the retrieval probe (query) based on the quality of the evidence. The overall process includes: (1) selecting corpora, (2) retrieving documents, (3) critiquing evidence for relevance and cultural alignment, and (4) checking sufficiency. If the retrieved documents are insufficient to answer the query correctly, the system (5) reselects corpora and rewrites the query. Across two cultural QA benchmarks, CORAL achieves up to a 3.58%p accuracy improvement on low-resource languages relative to the strongest baselines.
翻译:多语言检索增强生成(mRAG)通常采用固定检索空间实现,典型做法是通过查询或文档翻译,或利用多语言嵌入向量表示。然而,这一方法可能难以应对文化情境化查询——当检索条件与目标文化背景不匹配时,即使强大的检索器与生成器在从不当的语言或地域语境中获取证据时也难以生成符合文化背景的答案。为此,我们提出CORAL(具有智能体循环的上下文感知检索方法),这是一种面向mRAG的自适应检索方法,基于证据质量对检索空间(语料库)和检索探针(查询)进行迭代优化。整体流程包括:(1)选择语料库;(2)检索文档;(3)评估证据的相关性与文化对齐程度;(4)检查证据充分性。若检索文档不足以正确回答查询,系统将(5)重新选择语料库并改写查询。在两个文化问答基准测试中,CORAL在低资源语言上相较最强基线模型实现了最高3.58%的准确率提升。