Retrieval-augmented generation has achieved strong performance on knowledge-intensive tasks where query-document relevance can be identified through direct lexical or semantic matches. However, many real-world queries involve abstract reasoning, analogical thinking, or multi-step inference, which existing retrievers often struggle to capture. To address this challenge, we present DIVER, a retrieval pipeline designed for reasoning-intensive information retrieval. It consists of four components. The document preprocessing stage enhances readability and preserves content by cleaning noisy texts and segmenting long documents. The query expansion stage leverages large language models to iteratively refine user queries with explicit reasoning and evidence from retrieved documents. The retrieval stage employs a model fine-tuned on synthetic data spanning medical and mathematical domains, along with hard negatives, enabling effective handling of reasoning-intensive queries. Finally, the reranking stage combines pointwise and listwise strategies to produce both fine-grained and globally consistent rankings. On the BRIGHT benchmark, DIVER achieves state-of-the-art nDCG@10 scores of 46.8 overall and 31.9 on original queries, consistently outperforming competitive reasoning-aware models. These results demonstrate the effectiveness of reasoning-aware retrieval strategies in complex real-world tasks.
翻译:检索增强生成在知识密集型任务中已取得显著性能,这些任务中查询与文档的相关性可通过直接的词汇或语义匹配加以识别。然而,许多现实查询涉及抽象推理、类比思维或多步推断,现有检索器往往难以捕捉此类信息。为应对这一挑战,我们提出了DIVER——一种专为推理密集型信息检索设计的检索流水线。它由四个组件构成:文档预处理阶段通过清理噪声文本和切分长文档来提升可读性并保留内容;查询扩展阶段利用大型语言模型,借助从检索文档中获取的显式推理过程和证据,对用户查询进行迭代优化;检索阶段采用基于合成数据(覆盖医学和数学领域)及难负样本微调的模型,从而有效处理推理密集型查询;最后,重排序阶段结合逐点与列表式排序策略,生成兼具细粒度与全局一致性的排序结果。在BRIGHT基准测试中,DIVER在整体查询上取得了46.8的nDCG@10最优得分,在原始查询上达到31.9,并持续优于其他竞争性的推理感知模型。这些结果表明了推理感知型检索策略在复杂现实任务中的有效性。