Audio analysis is useful in many application scenarios. The state-of-the-art audio analysis approaches assume the data distribution at training and deployment time will be the same. However, due to various real-life challenges, the data may encounter drift in its distribution or can encounter new classes in the late future. Thus, a one-time trained model might not perform adequately. Continual learning (CL) approaches are devised to handle such changes in data distribution. There have been a few attempts to use CL approaches for audio analysis. Yet, there is a lack of a systematic evaluation framework. In this paper, we create a comprehensive CL dataset and characterize CL approaches for audio-based monitoring tasks. We have investigated the following CL and non-CL approaches: EWC, LwF, SI, GEM, A-GEM, GDumb, Replay, Naive, Cumulative, and Joint training. The study is very beneficial for researchers and practitioners working in the area of audio analysis for developing adaptive models. We observed that Replay achieved better results than other methods in the DCASE challenge data. It achieved an accuracy of 70.12% for the domain incremental scenario and an accuracy of 96.98% for the class incremental scenario.
翻译:音频分析在许多应用场景中具有重要价值。当前最先进的音频分析方法通常假设训练阶段与部署阶段的数据分布保持一致。然而,由于现实应用中的各种挑战,数据分布可能发生漂移,或在未来出现新的类别。因此,仅通过一次性训练的模型可能无法保持良好性能。持续学习方法正是为应对此类数据分布变化而设计的。目前已有若干将持续学习方法应用于音频分析的尝试,但尚缺乏系统性的评估框架。本文构建了一个综合性持续学习数据集,并对基于音频的监测任务中的持续学习方法进行了系统性特征分析。我们研究了以下持续学习与非持续学习方法:EWC、LwF、SI、GEM、A-GEM、GDumb、Replay、Naive、Cumulative以及Joint训练。本研究对从事音频分析领域的研究人员与从业者开发自适应模型具有重要参考价值。实验发现,在DCASE挑战数据集上,Replay方法相较于其他方法取得了更优结果:在领域增量场景中达到70.12%的准确率,在类别增量场景中达到96.98%的准确率。