Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching

Image-text matching remains a challenging task due to heterogeneous semantic diversity across modalities and insufficient distance separability within triplets. Different from previous approaches focusing on enhancing multi-modal representations or exploiting cross-modal correspondence for more accurate retrieval, in this paper we aim to leverage the knowledge transfer between peer branches in a boosting manner to seek a more powerful matching model. Specifically, we propose a brand-new Deep Boosting Learning (DBL) algorithm, where an anchor branch is first trained to provide insights into the data properties, with a target branch gaining more advanced knowledge to develop optimal features and distance metrics. Concretely, an anchor branch initially learns the absolute or relative distance between positive and negative pairs, providing a foundational understanding of the particular network and data distribution. Building upon this knowledge, a target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples. Extensive experiments validate that our DBL can achieve impressive and consistent improvements based on various recent state-of-the-art models in the image-text matching field, and outperform related popular cooperative strategies, e.g., Conventional Distillation, Mutual Learning, and Contrastive Learning. Beyond the above, we confirm that DBL can be seamlessly integrated into their training scenarios and achieve superior performance under the same computational costs, demonstrating the flexibility and broad applicability of our proposed method. Our code is publicly available at: https://github.com/Paranioar/DBL.

翻译：图文匹配由于跨模态语义多样性和三元组内距离可分性不足等挑战仍然是一项困难任务。不同于以往专注于增强多模态表示或利用跨模态对应关系以实现更精确检索的方法，本文旨在通过提升学习方式利用同级分支间的知识迁移来寻求更强大的匹配模型。具体而言，我们提出了一种全新的深度提升学习（DBL）算法，其中先训练一个锚定分支以提供对数据特性的洞察，同时目标分支获取更先进的知识以发展最优特征和距离度量。具体来说，锚定分支首先学习正负样本对之间的绝对或相对距离，提供对特定网络和数据分布的基础理解。基于此知识，目标分支被同时赋予更自适应的边界约束，以进一步扩大匹配与不匹配样本间的相对距离。大量实验证明，我们的DBL能够基于图文匹配领域多种最新最先进的模型取得显著且一致的改进，并优于相关的流行协同策略（如传统蒸馏、互学习和对比学习）。此外，我们证实DBL可无缝集成至这些训练场景中，并在相同计算成本下实现更优性能，展示了所提方法的灵活性和广泛适用性。我们的代码已在 https://github.com/Paranioar/DBL 公开。