Generative-based Fusion Mechanism for Multi-Modal Tracking

Generative models (GMs) have received increasing research interest for their remarkable capacity to achieve comprehensive understanding. However, their potential application in the domain of multi-modal tracking has remained relatively unexplored. In this context, we seek to uncover the potential of harnessing generative techniques to address the critical challenge, information fusion, in multi-modal tracking. In this paper, we delve into two prominent GM techniques, namely, Conditional Generative Adversarial Networks (CGANs) and Diffusion Models (DMs). Different from the standard fusion process where the features from each modality are directly fed into the fusion block, we condition these multi-modal features with random noise in the GM framework, effectively transforming the original training samples into harder instances. This design excels at extracting discriminative clues from the features, enhancing the ultimate tracking performance. To quantitatively gauge the effectiveness of our approach, we conduct extensive experiments across two multi-modal tracking tasks, three baseline methods, and three challenging benchmarks. The experimental results demonstrate that the proposed generative-based fusion mechanism achieves state-of-the-art performance, setting new records on LasHeR and RGBD1K.

翻译：生成模型（GMs）因其在实现全面理解方面的显著能力而受到越来越多的研究关注。然而，它们在多模态跟踪领域的潜在应用仍相对未被探索。在此背景下，我们试图揭示利用生成技术应对多模态跟踪中关键挑战——信息融合——的潜力。本文深入研究了两种主流的生成模型技术，即条件生成对抗网络（CGANs）和扩散模型（DMs）。与标准融合过程（即直接将从各模态提取的特征输入融合模块）不同，我们在生成模型框架下将这些多模态特征与随机噪声结合，有效将原始训练样本转化为更难的实例。这种设计擅长从特征中提取判别性线索，从而提升最终跟踪性能。为定量评估我们方法的有效性，我们在两项多模态跟踪任务、三种基线方法和三个具有挑战性的基准数据集上进行了广泛实验。实验结果表明，所提出的基于生成的融合机制实现了最先进的性能，并在LasHeR和RGBD1K数据集上创造了新纪录。

相关内容

关注 0

《图形模型》是国际公认的高评价的顶级期刊，专注于图形模型的创建、几何处理、动画和可视化，以及它们在工程、科学、文化和娱乐方面的应用。GMOD为其读者提供了经过彻底审查和精心挑选的论文，这些论文传播令人兴奋的创新，传授严谨的理论基础，提出健壮和有效的解决方案，或描述各种主题中的雄心勃勃的系统或应用程序。官网地址：http://dblp.uni-trier.de/db/journals/cvgip/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日