Dense retrieval represents queries and documents as high-dimensional embeddings, but these representations can be redundant at the query level: for a given information need, only a subset of dimensions is consistently helpful for ranking. Prior work addresses this via pseudo-relevance feedback (PRF) based dimension importance estimation, which can produce query-aware masks without labeled data but often relies on noisy pseudo signals and heuristic test-time procedures. In contrast, supervised adapter methods leverage relevance labels to improve embedding quality, yet they learn global transformations shared across queries and do not explicitly model query-aware dimension importance. We propose a Query-Aware Adaptive Dimension Selection framework that \emph{learns} to predict per-dimension importance directly from query embedding. We first construct oracle dimension importance distributions over embedding dimensions using supervised relevance labels, and then train a predictor to map a query embedding to these label-distilled importance scores. At inference, the predictor selects a query-aware subset of dimensions for similarity computation based solely on the query embedding, without pseudo-relevance feedback. Experiments across multiple dense retrievers and benchmarks show that our learned dimension selector improves retrieval effectiveness over the full-dimensional baseline as well as PRF-based masking and supervised adapter baselines.
翻译:稠密检索将查询与文档表示为高维嵌入向量,但这些表示在查询层面可能存在冗余:对于特定的信息需求,仅有部分维度对排序具有持续贡献。现有研究多通过基于伪相关反馈的维度重要性估计来解决此问题,该方法可在无标注数据条件下生成查询感知的掩码,但常依赖噪声伪信号及启发式测试流程。相比之下,有监督适配器方法虽能利用相关性标注提升嵌入质量,却学习跨查询共享的全局变换,未能显式建模查询感知的维度重要性。本文提出一种面向查询的自适应维度选择框架,该框架通过直接学习从查询嵌入预测各维度重要性。我们首先利用监督相关性标注构建嵌入维度上的理想维度重要性分布,随后训练预测器将查询嵌入映射至这些基于标注蒸馏的重要性分数。在推理阶段,预测器仅依据查询嵌入即可选择查询感知的维度子集进行相似度计算,无需伪相关反馈。在多种稠密检索模型与基准测试上的实验表明,我们学习的维度选择器在检索效果上优于全维度基线,同时超越了基于伪相关反馈的掩码方法与有监督适配器基线。