Retrieval over visually rich documents is essential for tasks such as legal discovery, scientific search, and enterprise knowledge management. Existing approaches fall into two paradigms: single-vector retrieval, which is efficient but coarse, and multi-vector retrieval, which is accurate but computationally expensive. To address this trade-off, we propose HEAVEN, a plug-and-play two-stage hybrid-vector framework. In the first stage, HEAVEN efficiently retrieves candidate pages using a single-vector method over Visually-Summarized Pages (VS-Pages), which assemble representative visual layouts from multiple pages. In the second stage, it reranks candidates with a multi-vector method while filtering query tokens by linguistic importance to reduce redundant computations. To evaluate retrieval systems under realistic conditions, we also introduce ViMDoc, a benchmark for visually rich, multi-document, and long-document retrieval. Across four benchmarks, HEAVEN attains 99.87% of the Recall@1 performance of multi-vector models on average while reducing per-query computation by 99.82%, achieving efficiency and accuracy. Our code and datasets are available at: https://github.com/juyeonnn/HEAVEN
翻译:针对法律文档检索、科学搜索及企业知识管理等任务,视觉丰富文档的检索至关重要。现有方法可分为两类:单向量检索效率高但粒度粗糙,多向量检索精度高但计算开销大。为解决这一权衡,我们提出HEAVEN——一种即插即用的两阶段混合向量框架。第一阶段,HEAVEN通过视觉摘要页面(VS-Pages)上的单向量方法高效检索候选页面,VS-Pages整合了多页的典型视觉布局。第二阶段,采用多向量方法对候选结果重新排序,同时依据语言重要性过滤查询词元以减少冗余计算。为在真实场景下评估检索系统,我们还提出ViMDoc——面向视觉丰富文档、多文档及长文档检索的基准测试集。在四个基准测试中,HEAVEN平均达到多向量模型Recall@1指标99.87%的性能,同时将单查询计算量降低99.82%,兼顾效率与精度。我们的代码与数据集发布于:https://github.com/juyeonnn/HEAVEN