Composed image retrieval (CIR) addresses the task of retrieving a target image by jointly interpreting a reference image and a modification text that specifies the intended change. Most existing methods are still built upon contrastive learning frameworks that treat the ground truth image as the only positive instance and all remaining images as negatives. This strategy inevitably introduces relevance suppression, where semantically related yet valid images are incorrectly pushed away, and semantic confusion, where different modification intents collapse into overlapping regions of the embedding space. As a result, the learned query representations often lack discriminativeness, particularly at fine-grained attribute modifications. To overcome these limitations, we propose distinctive query embeddings through learnable attribute weights and target relative negative sampling (DQE-CIR), a method designed to learn distinctive query embeddings by explicitly modeling target relative relevance during training. DQE-CIR incorporates learnable attribute weighting to emphasize distinctive visual features conditioned on the modification text, enabling more precise feature alignment between language and vision. Furthermore, we introduce target relative negative sampling, which constructs a target relative similarity distribution and selects informative negatives from a mid-zone region that excludes both easy negatives and ambiguous false negatives. This strategy enables more reliable retrieval for fine-grained attribute changes by improving query discriminativeness and reducing confusion caused by semantically similar but irrelevant candidates.
翻译:组合图像检索(CIR)旨在通过联合理解参考图像与指定预期修改的文本,来检索目标图像。现有方法大多仍基于对比学习框架,仅将真实目标图像视为正样本,而将所有其他图像均作为负样本。该策略不可避免地引入了相关性抑制(即语义相关但有效的图像被错误推远)与语义混淆(即不同修改意图在嵌入空间中坍缩至重叠区域)。因此,学习到的查询表示通常缺乏区分度,尤其在细粒度属性修改场景下更为明显。为克服这些局限,我们提出通过可学习属性权重与目标相对负采样构建可区分查询嵌入的方法(DQE-CIR),该方法通过在训练中显式建模目标相对相关性来学习具有高区分度的查询嵌入。DQE-CIR引入可学习属性加权机制,以根据修改文本强调具有区分性的视觉特征,从而实现语言与视觉特征间更精准的对齐。此外,我们提出了目标相对负采样策略,该策略构建目标相对相似度分布,并从排除简单负样本与模糊假负样本的中间区域选取信息量丰富的负样本。此策略通过提升查询表示的区分度并减少语义相似但不相关候选样本带来的混淆,实现了对细粒度属性变化更可靠的检索。