Composed Image Retrieval (CIR) enables image search by combining a reference image with modification text. Intrinsic noise in CIR triplets incurs intrinsic uncertainty and threatens the model's robustness. Probabilistic learning approaches have shown promise in addressing such issues; however, they fall short for CIR due to their instance-level holistic modeling and homogeneous treatment of queries and targets. This paper introduces a Heterogeneous Uncertainty-Guided (HUG) paradigm to overcome these limitations. HUG utilizes a fine-grained probabilistic learning framework, where queries and targets are represented by Gaussian embeddings that capture detailed concepts and uncertainties. We customize heterogeneous uncertainty estimations for multi-modal queries and uni-modal targets. Given a query, we capture uncertainties not only regarding uni-modal content quality but also multi-modal coordination, followed by a provable dynamic weighting mechanism to derive comprehensive query uncertainty. We further design uncertainty-guided objectives, including query-target holistic contrast and fine-grained contrasts with comprehensive negative sampling strategies, which effectively enhance discriminative learning. Experiments on benchmarks demonstrate HUG's effectiveness beyond state-of-the-art baselines, with faithful analysis justifying the technical contributions.
翻译:组合图像检索(CIR)通过结合参考图像与修改文本实现图像搜索。CIR三元组中的固有噪声会引发内在不确定性并威胁模型的鲁棒性。概率学习方法在解决此类问题上已显示出潜力;然而,由于其实例级整体建模以及对查询与目标的同质化处理,这些方法在CIR任务中存在不足。本文提出一种异构不确定性引导(HUG)范式以克服这些局限。HUG采用细粒度概率学习框架,其中查询与目标通过捕捉细节概念及不确定性的高斯嵌入表示。我们为多模态查询与单模态目标定制了异构不确定性估计。给定查询时,我们不仅捕获单模态内容质量的不确定性,还捕捉多模态协调的不确定性,随后通过可证明的动态加权机制推导出综合查询不确定性。我们进一步设计了不确定性引导的目标函数,包括查询-目标整体对比及结合全面负采样策略的细粒度对比,有效增强了判别性学习。基准测试实验证明HUG超越了现有最先进基线的性能,可靠的分析验证了其技术贡献。