LLM-reranking is limited by the top-k documents retrieved by vector similarity, which neither enables contextual query-document token interactions nor captures multimodal relevance distributions. While LLM query reformulation attempts to improve recall by generating improved or additional queries, it is still followed by vector similarity retrieval. We thus propose to address these top-k retrieval stage failures by introducing ReBOL, which 1) uses LLM query reformulations to initialize a multimodal Bayesian Optimization (BO) posterior over document relevance, and 2) iteratively acquires document batches for LLM query-document relevance scoring followed by posterior updates to optimize relevance. After exploring query reformulation and document batch diversification techniques, we evaluate ReBOL against LLM reranker baselines on five BEIR datasets and using two LLMs (Gemini-2.5-Flash-Lite, GPT-5.2). ReBOL consistently achieves higher recall and competitive rankings, for example compared to the best LLM reranker on the Robust04 dataset with 46.5% vs. 35.0% recall@100 and 63.6% vs. 61.2% NDCG@10. We also show that ReBOL can achieve comparable latency to LLM rerankers.
翻译:LLM重排序受限于通过向量相似性检索得到的top-k文档,这种方法既无法实现上下文相关的查询-文档令牌交互,也无法捕获多模态相关性分布。尽管LLM查询重构旨在通过生成改进或额外的查询来提升召回率,但其后续仍依赖向量相似性检索。为此,我们提出ReBOL来解决top-k检索阶段的失效问题:1)利用LLM查询重构初始化文档相关性的多模态贝叶斯优化(BO)后验分布;2)迭代获取文档批次用于LLM查询-文档相关性评分,并通过后验更新优化相关性。在探索查询重构与文档批次多样化技术后,我们使用两个LLM(Gemini-2.5-Flash-Lite、GPT-5.2)在五个BEIR数据集上评估ReBOL与LLM重排序基线的性能。ReBOL始终实现更高的召回率和有竞争力的排名,例如在Robust04数据集上,相较于最优LLM重排序基线,ReBOL的recall@100达到46.5%对35.0%,NDCG@10达到63.6%对61.2%。我们还表明,ReBOL的延迟可与LLM重排序相媲美。