Dense vector retrieval is the practical backbone of Retrieval- Augmented Generation (RAG), but similarity search can suffer from precision limitations. Conversely, utility-based approaches leveraging LLM re-ranking often achieve superior performance but are computationally prohibitive and prone to noise inherent in perplexity estimation. We propose Utility-Aligned Embeddings (UAE), a framework designed to merge these advantages into a practical, high-performance retrieval method. We formulate retrieval as a distribution matching problem, training a bi-encoder to imitate a utility distribution derived from perplexity reduction using a Utility-Modulated InfoNCE objective. This approach injects graded utility signals directly into the embedding space without requiring test-time LLM inference. On the QASPER benchmark, UAE improves retrieval Recall@1 by 30.59%, MAP by 30.16% and Token F1 by 17.3% over the strong semantic baseline BGE-Base. Crucially, UAE is over 180x faster than the efficient LLM re-ranking methods preserving competitive performance, demonstrating that aligning retrieval with generative utility yields reliable contexts at scale.
翻译:稠密向量检索是检索增强生成(RAG)的实际核心支柱,但相似性搜索可能受限于精度不足。相反,利用LLM重排序的基于效用的方法通常能实现更优性能,但计算代价高昂且易受困惑度评估固有噪声干扰。我们提出效用对齐嵌入(UAE)框架,旨在将上述优势融合为一种实用且高性能的检索方法。我们将检索问题形式化为分布匹配任务,通过效用调制信息噪声对比(Utility-Modulated InfoNCE)目标训练双编码器,使其模仿基于困惑度降低导出的效用分布。该方法在不需测试时LLM推理的情况下,直接将分级效用信号注入嵌入空间。在QASPER基准上,UAE相较于强语义基线BGE-Base,将检索Recall@1提升30.59%、MAP提升30.16%、Token F1提升17.3%。尤为重要的是,UAE在保持竞争性能的同时,比高效LLM重排序方法快180倍以上,证明将检索与生成效用对齐能在规模化场景中提供可靠上下文。