Dense retrieval models exhibit positional bias: retrieval effectiveness degrades when relevant information appears later in a passage (Zeng et al., 2025). We ask whether this bias can be reduced at inference time, without retraining and without sacrificing overall retrieval effectiveness. To this end, we adapt inference-time attention calibration (Schuhmacher et al., 2026) to downstream retrieval and extend it with a strength coefficient lambda that interpolates between the original and fully calibrated attention distributions. Across three embedding models on SQuAD-PosQ and FineWeb-PosQ, we examine how basket size, calibrated layer set, and strength affect the trade-off between positional fairness and retrieval effectiveness, finding that partial calibration frequently outperforms full calibration. A single configuration (B=128, lambda=0.5, 50% layer depth) improves the harmonic mean of nDCG@10 across positional groups on FineWeb-PosQ for all three models without per-model tuning, and applies to both <s>-pooled and last-token-pooled architectures. This default configuration transfers without modification to PosIR, which spans 10 languages and 31 domains, reducing the Position Sensitivity Index in all 16 length-quartile x model x retrieval-setting combinations, while preserving or improving aggregate nDCG@10. We release our extended codebase at https://github.com/impresso/fair-sentence-transformers
翻译:稠密检索模型存在位置偏差:当相关信息出现在段落较后位置时,检索效果会下降(Zeng等,2025)。我们探究是否能在推理阶段无需重新训练且不牺牲整体检索效果的前提下减少这种偏差。为此,我们将推理时的注意力校准方法(Schuhmacher等,2026)适配至下游检索任务,并通过强度系数λ在原始注意力分布与完全校准后的注意力分布之间进行插值。在SQuAD-PosQ和FineWeb-PosQ数据集上使用三种嵌入模型,我们考察了篮子大小、校准层集合以及校准强度如何影响位置公平性与检索效果之间的权衡,发现部分校准通常优于完全校准。单一配置(B=128,λ=0.5,50%层深)无需针对不同模型调整参数,即可提升FineWeb-PosQ上所有三种模型在位置分组间nDCG@10的调和平均值,且该配置同时适用于<s>-池化和末位令牌池化架构。此默认配置无需修改即可迁移至涵盖10种语言和31个领域的PosIR基准,在全部16种(长度四分位×模型×检索设置)组合中降低了位置敏感指数,同时保持或提升了总体nDCG@10。我们在https://github.com/impresso/fair-sentence-transformers 发布了扩展后的代码库。