DQE-CIR: Distinctive Query Embeddings through Learnable Attribute Weights and Target Relative Negative Sampling in Composed Image Retrieval

Composed image retrieval (CIR) addresses the task of retrieving a target image by jointly interpreting a reference image and a modification text that specifies the intended change. Most existing methods are still built upon contrastive learning frameworks that treat the ground truth image as the only positive instance and all remaining images as negatives. This strategy inevitably introduces relevance suppression, where semantically related yet valid images are incorrectly pushed away, and semantic confusion, where different modification intents collapse into overlapping regions of the embedding space. As a result, the learned query representations often lack discriminativeness, particularly at fine-grained attribute modifications. To overcome these limitations, we propose distinctive query embeddings through learnable attribute weights and target relative negative sampling (DQE-CIR), a method designed to learn distinctive query embeddings by explicitly modeling target relative relevance during training. DQE-CIR incorporates learnable attribute weighting to emphasize distinctive visual features conditioned on the modification text, enabling more precise feature alignment between language and vision. Furthermore, we introduce target relative negative sampling, which constructs a target relative similarity distribution and selects informative negatives from a mid-zone region that excludes both easy negatives and ambiguous false negatives. This strategy enables more reliable retrieval for fine-grained attribute changes by improving query discriminativeness and reducing confusion caused by semantically similar but irrelevant candidates.

翻译：组合图像检索（CIR）旨在通过联合理解参考图像与指定预期修改的文本，来检索目标图像。现有方法大多仍基于对比学习框架，仅将真实目标图像视为正样本，而将所有其他图像均作为负样本。该策略不可避免地引入了相关性抑制（即语义相关但有效的图像被错误推远）与语义混淆（即不同修改意图在嵌入空间中坍缩至重叠区域）。因此，学习到的查询表示通常缺乏区分度，尤其在细粒度属性修改场景下更为明显。为克服这些局限，我们提出通过可学习属性权重与目标相对负采样构建可区分查询嵌入的方法（DQE-CIR），该方法通过在训练中显式建模目标相对相关性来学习具有高区分度的查询嵌入。DQE-CIR引入可学习属性加权机制，以根据修改文本强调具有区分性的视觉特征，从而实现语言与视觉特征间更精准的对齐。此外，我们提出了目标相对负采样策略，该策略构建目标相对相似度分布，并从排除简单负样本与模糊假负样本的中间区域选取信息量丰富的负样本。此策略通过提升查询表示的区分度并减少语义相似但不相关候选样本带来的混淆，实现了对细粒度属性变化更可靠的检索。

相关内容

属性

关注 2

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

【ICML2025】QuRe：通过困难负样本采样实现查询相关的组合图像检索

专知会员服务

7+阅读 · 2025年7月20日

【CVPR2025】CoLLM：面向组合图像检索的大语言模型

专知会员服务

13+阅读 · 2025年3月26日

组合图像检索的全面综述

专知会员服务

17+阅读 · 2025年3月2日

基于深度学习的图像目标检测算法综述

专知会员服务

101+阅读 · 2022年4月15日