Cosine similarity is prevalent in contrastive learning, yet it assumes embedding magnitude is noise. We systematically study magnitude learning through a framework that independently controls query-side and document-side normalization. First, magnitude learning benefits retrieval and Retrieval-Augmented Generation (RAG) where queries and documents have distinct roles, but not Semantic Textual Similarity (STS) or CLIP where inputs are interchangeable. Second, query and document magnitudes serve different roles: document magnitude scales inference scores, while query magnitude modulates training gradients. Normalizing one side consistently outperforms both sides, and the Fisher Information Matrix condition number predicts which side to normalize. Third, magnitude learning improves out-of-domain generalization more than in-domain performance, with gains up to +72\% vs +7\%, requiring retrieval-specialized pre-training or sufficient data. These findings provide practical guidance for retrieval and RAG across text and vision domains.
翻译:余弦相似度在对比学习中普遍应用,但其假设嵌入幅度为噪声。我们通过一个独立控制查询端与文档端归一化的框架,系统性地研究了幅度学习。首先,幅度学习对查询与文档角色不同的检索与检索增强生成(RAG)任务有益,但对输入可互换的语义文本相似度(STS)或CLIP任务无益。其次,查询幅度与文档幅度发挥不同作用:文档幅度缩放推理分数,而查询幅度调节训练梯度。仅归一化单侧始终优于双侧归一化,且费舍尔信息矩阵条件数可预测应归一化哪一侧。第三,幅度学习对域外泛化能力的提升(最高达+72%)显著高于域内性能提升(+7%),这需要检索专用预训练或充足数据支持。这些发现为跨文本与视觉领域的检索及RAG任务提供了实用指导。