miniReranker: Efficient Multimodal Reranking through Visual Cache Reuse and Interaction Sparsity

Multimodal large language models (MLLMs) have recently shown strong potential as point-wise rerankers by directly modeling query--document relevance through next-token prediction. However, point-wise reranking suffers from substantial repeated computation across query--document pairs, while the causal structure of transformers allows only prefix segments to be reused via pre-caching. To address the misalignment of existing query-first and document-first formats with both VQA-style prompting and computation-aware reuse, we propose a \textit{vision-first} formulation that improves both cache reuse efficiency and reranking performance. However, the remaining cost is still considerable and stems from three main sources: (1) \textit{model depth}, for which we reduce active parameters via early exit; (2) \textit{cross-segment attention}, which we restrict to a narrow interaction band across a few layers; and (3) \textit{visual tokens}, where we reduce the number of tokens via embedder-guided pruning. Together, these designs form miniReranker, which reduces reranking runtime to <1% of the dense implementation under high-reuse settings for a single query, while preserving >96% of the dense model performance.

翻译：摘要：多模态大语言模型通过直接建模查询-文档相关性以进行下一个词元预测，近期在作为点式重排序器方面展现出强大潜力。然而，点式重排序面临跨查询-文档对的重复计算问题，而Transformer的因果结构仅允许通过预缓存复用前缀片段。为解决现有查询优先和文档优先格式与视觉问答提示风格及计算感知复用之间的错配问题，我们提出一种**视觉优先**公式，可同时提升缓存复用效率与重排序性能。但剩余的计算成本仍相当可观，主要来源于三个方面：(1) **模型深度**——我们通过提前退出机制减少活跃参数；(2) **跨段注意力**——我们将其限制在少数层内的窄交互带中；(3) **视觉词元**——我们通过嵌入器引导剪枝减少词元数量。这些设计共同构成了miniReranker，在高复用场景下处理单个查询时，其重排序运行时间可降至密集实现的<1%，同时保持>96%的密集模型性能。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【AAAI2026】URaG：面向高效长文档理解的多模态大语言模型统一检索与生成框架

专知会员服务

15+阅读 · 2025年11月14日

Transformer的无限之路：位置编码视角下的长度外推综述

专知会员服务

44+阅读 · 2024年1月17日

Meta-Transformer：多模态学习的统一框架

专知会员服务

59+阅读 · 2023年7月21日

大模型如何决策？上交大等最新《面向序列决策的大序列模型》综述

专知会员服务

117+阅读 · 2023年6月28日