Taming Modality Entanglement in Continual Audio-Visual Segmentation

Recently, significant progress has been made in multi-modal continual learning, aiming to learn new tasks sequentially in multi-modal settings while preserving performance on previously learned ones. However, existing methods mainly focus on coarse-grained tasks, with limitations in addressing modality entanglement in fine-grained continual learning settings. To bridge this gap, we introduce a novel Continual Audio-Visual Segmentation (CAVS) task, aiming to continuously segment new classes guided by audio. Through comprehensive analysis, two critical challenges are identified: 1) multi-modal semantic drift, where a sounding objects is labeled as background in sequential tasks; 2) co-occurrence confusion, where frequent co-occurring classes tend to be confused. In this work, a Collision-based Multi-modal Rehearsal (CMR) framework is designed to address these challenges. Specifically, for multi-modal semantic drift, a Multi-modal Sample Selection (MSS) strategy is proposed to select samples with high modal consistency for rehearsal. Meanwhile, for co-occurence confusion, a Collision-based Sample Rehearsal (CSR) mechanism is designed, allowing for the increase of rehearsal sample frequency of those confusable classes during training process. Moreover, we construct three audio-visual incremental scenarios to verify effectiveness of our method. Comprehensive experiments demonstrate that our method significantly outperforms single-modal continual learning methods.

翻译：近年来，多模态持续学习领域取得了显著进展，其目标是在多模态环境中顺序学习新任务，同时保持对先前学习任务的性能。然而，现有方法主要关注粗粒度任务，在解决细粒度持续学习环境中的模态纠缠方面存在局限。为弥补这一差距，我们引入了一种新颖的持续音频-视觉分割任务，旨在音频引导下持续分割新类别。通过综合分析，我们识别出两个关键挑战：1）多模态语义漂移，即在顺序任务中发声物体被标记为背景；2）共现混淆，即频繁共现的类别容易相互混淆。本工作设计了一个基于碰撞的多模态回放框架以应对这些挑战。具体而言，针对多模态语义漂移，我们提出了多模态样本选择策略，以选择具有高模态一致性的样本进行回放。同时，针对共现混淆，我们设计了基于碰撞的样本回放机制，允许在训练过程中增加这些易混淆类别的回放样本频率。此外，我们构建了三种音频-视觉增量场景以验证方法的有效性。综合实验表明，我们的方法显著优于单模态持续学习方法。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日