Comprehensively retrieving diverse documents is crucial to address queries that admit a wide range of valid answers. We introduce retrieve-verify-retrieve (RVR), a multi-round retrieval framework designed to maximize answer coverage. Initially, a retriever takes the original query and returns a candidate document set, followed by a verifier that identifies a high-quality subset. For subsequent rounds, the query is augmented with previously verified documents to uncover answers that are not yet covered in previous rounds. RVR is effective even with off-the-shelf retrievers, and fine-tuning retrievers for our inference procedure brings further gains. Our method outperforms baselines, including agentic search approaches, achieving at least 10% relative and 3% absolute gain in complete recall percentage on a multi-answer retrieval dataset (QAMPARI). We also see consistent gains on two out-of-domain datasets (QUEST and WebQuestionsSP) across different base retrievers. Our work presents a promising iterative approach for comprehensive answer recall leveraging a verifier and adapting retrievers to a new inference scenario.
翻译:全面检索多样化文档对于处理那些存在广泛有效答案范围的查询至关重要。本文提出检索-验证-检索(RVR)框架,这是一种旨在最大化答案覆盖度的多轮检索方法。该框架首先通过检索器接收原始查询并返回候选文档集,随后由验证器识别高质量文档子集。在后续轮次中,查询会与先前验证的文档进行增强,以发现前几轮尚未覆盖的答案。即使使用现成检索器,RVR仍能保持高效性能,而针对本推理流程对检索器进行微调可带来进一步增益。我们的方法在包括智能体搜索方法在内的基线模型上表现优异,在多答案检索数据集(QAMPARI)上实现了至少10%的相对提升和3%的绝对提升(以完全召回率为指标)。在两个跨领域数据集(QUEST和WebQuestionsSP)上,使用不同基础检索器时也观察到一致的性能增益。本研究提出了一种具有前景的迭代式全面答案召回方法,通过验证器机制使检索器适应新的推理场景。