In this paper, we propose a method for incremental learning of two distinct tasks over time: acoustic scene classification (ASC) and audio tagging (AT). We use a simple convolutional neural network (CNN) model as an incremental learner to solve the tasks. Generally, incremental learning methods catastrophically forget the previous task when sequentially trained on a new task. To alleviate this problem, we use independent learning and knowledge distillation (KD) between the timesteps in learning. Experiments are performed on TUT 2016/2017 dataset, containing 4 acoustic scene classes and 25 sound event classes. The proposed incremental learner solves the AT task with an F1 score of 54.4% and the ASC task with an accuracy of 88.9% in an incremental time step, outperforming a multi-task system which solves ASC and AT at the same time. The ASC task performance degrades only by 5.1% from the initial time ASC accuracy of 94.0%.
翻译:本文提出了一种随时间增量学习两种不同任务的方法:声学场景分类(ASC)和音频标记(AT)。我们采用简单的卷积神经网络(CNN)模型作为增量学习器来求解这些任务。通常,增量学习方法在依次对新任务进行训练时会灾难性地遗忘先前任务。为缓解这一问题,我们在学习过程中使用独立学习与时间步之间的知识蒸馏(KD)。实验在包含4种声学场景类别和25种声音事件类别的TUT 2016/2017数据集上进行。所提出的增量学习器在增量时间步中以54.4%的F1分数解决AT任务,并以88.9%的准确率解决ASC任务,优于同时求解ASC和AT的多任务系统。ASC任务性能仅从初始时间步的94.0%准确率下降了5.1%。