ReinPool：基于强化学习的多向量嵌入池化方法在检索系统中的应用 (ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System)

Multi-vector embedding models have emerged as a powerful paradigm for document retrieval, preserving fine-grained visual and textual details through token-level representations. However, this expressiveness comes at a staggering cost: storing embeddings for every token inflates index sizes by over $1000\times$ compared to single-vector approaches, severely limiting scalability. We introduce \textbf{ReinPool}, a reinforcement learning framework that learns to dynamically filter and pool multi-vector embeddings into compact, retrieval-optimized representations. By training with an inverse retrieval objective and NDCG-based rewards, ReinPool identifies and retains only the most discriminative vectors without requiring manual importance annotations. On the Vidore V2 benchmark across three vision-language embedding models, ReinPool compresses multi-vector representations by $746$--$1249\times$ into single vectors while recovering 76--81\% of full multi-vector retrieval performance. Compared to static mean pooling baselines, ReinPool achieves 22--33\% absolute NDCG@3 improvement, demonstrating that learned selection significantly outperforms heuristic aggregation.

翻译：多向量嵌入模型已成为文档检索的强大范式，通过令牌级表征保留细粒度的视觉与文本细节。然而，这种表达能力的代价是惊人的：相比单向量方法，存储每个令牌的嵌入会使索引规模膨胀超过1000倍，严重限制了可扩展性。本文提出\textbf{ReinPool}——一种基于强化学习的框架，通过学习动态筛选与池化多向量嵌入，将其压缩为紧凑且检索优化的表征。通过结合逆向检索目标与基于NDCG的奖励机制进行训练，ReinPool能够在不依赖人工重要性标注的情况下，识别并仅保留最具区分度的向量。在Vidore V2基准测试中，针对三种视觉-语言嵌入模型，ReinPool将多向量表征压缩746至1249倍为单向量，同时恢复了完整多向量检索性能的76-81%。相较于静态均值池化基线，ReinPool在NDCG@3指标上实现了22-33%的绝对提升，证明学习型选择策略显著优于启发式聚合方法。