Dense retrieval represents queries and docu-002 ments as high-dimensional embeddings, but003 these representations can be redundant at the004 query level: for a given information need, only005 a subset of dimensions is consistently help-006 ful for ranking. Prior work addresses this via007 pseudo-relevance feedback (PRF) based dimen-008 sion importance estimation, which can produce009 query-aware masks without labeled data but010 often relies on noisy pseudo signals and heuris-011 tic test-time procedures. In contrast, super-012 vised adapter methods leverage relevance labels013 to improve embedding quality, yet they learn014 global transformations shared across queries015 and do not explicitly model query-aware di-016 mension importance. We propose a Query-017 Aware Adaptive Dimension Selection frame-018 work that learns to predict per-dimension im-019 portance directly from query embedding. We020 first construct oracle dimension importance dis-021 tributions over embedding dimensions using022 supervised relevance labels, and then train a023 predictor to map a query embedding to these024 label-distilled importance scores. At inference,025 the predictor selects a query-aware subset of026 dimensions for similarity computation based027 solely on the query embedding, without pseudo-028 relevance feedback. Experiments across multi-029 ple dense retrievers and benchmarks show that030 our learned dimension selector improves re-031 trieval effectiveness over the full-dimensional032 baseline as well as PRF-based masking and033 supervised adapter baselines.
翻译:稠密检索将查询和文档表示为高维嵌入向量,但这些表示在查询层面可能存在冗余:对于特定的信息需求,通常只有部分维度对排序具有持续帮助。先前工作通过基于伪相关反馈(PRF)的维度重要性估计来解决此问题,该方法可在无标注数据情况下生成查询感知掩码,但往往依赖噪声伪信号和启发式测试流程。相比之下,有监督适配器方法利用相关性标签提升嵌入质量,然而它们学习的是跨查询共享的全局变换,并未显式建模查询感知的维度重要性。本文提出一种查询感知自适应维度选择框架,该框架能够直接从查询嵌入中学习预测逐维度重要性。我们首先利用有监督相关性标签构建嵌入维度上的理想维度重要性分布,随后训练预测器将查询嵌入映射至这些经标签蒸馏的重要性分数。在推理阶段,预测器仅依据查询嵌入即可选择查询感知的维度子集进行相似度计算,无需伪相关反馈。在多种稠密检索模型和基准测试上的实验表明,我们学习的维度选择器在检索效能上优于全维度基线、基于PRF的掩码方法以及有监督适配器基线。