We investigate composed image retrieval with text feedback. Users gradually look for the target of interest by moving from coarse to fine-grained feedback. However, existing methods merely focus on the latter, i.e., fine-grained search, by harnessing positive and negative pairs during training. This pair-based paradigm only considers the one-to-one distance between a pair of specific points, which is not aligned with the one-to-many coarse-grained retrieval process and compromises the recall rate. In an attempt to fill this gap, we introduce a unified learning approach to simultaneously modeling the coarse- and fine-grained retrieval by considering the multi-grained uncertainty. The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively. Specifically, our method contains two modules: uncertainty modeling and uncertainty regularization. (1) The uncertainty modeling simulates the multi-grained queries by introducing identically distributed fluctuations in the feature space. (2) Based on the uncertainty modeling, we further introduce uncertainty regularization to adapt the matching objective according to the fluctuation range. Compared with existing methods, the proposed strategy explicitly prevents the model from pushing away potential candidates in the early stage, and thus improves the recall rate. On the three public datasets, i.e., FashionIQ, Fashion200k, and Shoes, the proposed method has achieved +4.03%, +3.38%, and +2.40% Recall@50 accuracy over a strong baseline, respectively.
翻译:我们研究了基于文本反馈的组合图像检索问题。用户通过从粗粒度到细粒度的反馈逐步寻找感兴趣的目标。然而,现有方法仅关注后者(即细粒度搜索),通过训练过程中利用正负样本对进行学习。这种基于样本对的范式仅考虑特定点对之间的一对一距离,这与一对多的粗粒度检索过程不匹配,且会降低召回率。为填补这一空白,我们提出了一种统一的学习方法,通过考虑多粒度不确定性同时建模粗粒度和细粒度检索。该方法的核心理念是将细粒度和粗粒度检索分别视为匹配具有小波动和大波动的数据点。具体而言,我们的方法包含两个模块:不确定性建模和不确定性正则化。(1) 不确定性建模通过在特征空间中引入同分布的波动来模拟多粒度查询。(2) 基于不确定性建模,我们进一步引入不确定性正则化,根据波动范围调整匹配目标。与现有方法相比,所提策略明确防止模型在早期阶段推开潜在候选对象,从而提高了召回率。在三个公开数据集(FashionIQ、Fashion200k和Shoes)上,所提方法在强基线上分别实现了Recall@50准确率提升+4.03%、+3.38%和+2.40%。