Continual learning involves training neural networks incrementally for new tasks while retaining the knowledge of previous tasks. However, efficiently fine-tuning the model for sequential tasks with minimal computational resources remains a challenge. In this paper, we propose Task Incremental Continual Learning (TI-CL) of audio classifiers with both parameter-efficient and compute-efficient Audio Spectrogram Transformers (AST). To reduce the trainable parameters without performance degradation for TI-CL, we compare several Parameter Efficient Transfer (PET) methods and propose AST with Convolutional Adapters for TI-CL, which has less than 5% of trainable parameters of the fully fine-tuned counterparts. To reduce the computational complexity, we introduce a novel Frequency-Time factorized Attention (FTA) method that replaces the traditional self-attention in transformers for audio spectrograms. FTA achieves competitive performance with only a factor of the computations required by Global Self-Attention (GSA). Finally, we formulate our method for TI-CL, called Adapter Incremental Continual Learning (AI-CL), as a combination of the "parameter-efficient" Convolutional Adapter and the "compute-efficient" FTA. Experiments on ESC-50, SpeechCommandsV2 (SCv2), and Audio-Visual Event (AVE) benchmarks show that our proposed method prevents catastrophic forgetting in TI-CL while maintaining a lower computational budget.
翻译:持续学习涉及神经网络对新增任务的增量训练,同时保留先前任务的知识。然而,如何以最少计算资源高效微调模型以处理连续任务仍是一项挑战。本文提出面向音频分类器的任务增量式持续学习(TI-CL)方法,采用兼具参数高效与计算高效的音频频谱图Transformer(AST)。为在不降低TI-CL性能的前提下减少可训练参数量,我们对比多种参数高效迁移(PET)方法,并提出基于卷积适配器的AST用于TI-CL,其可训练参数总量不足完全微调版本的5%。为降低计算复杂度,我们引入新颖的频时因子化注意力(FTA)方法,替代音频频谱图Transformer中传统的自注意力机制。FTA仅需全局自注意力(GSA)计算量的一个因子即可实现竞争性性能。最终,我们将TI-CL方法定义为"参数高效"卷积适配器与"计算高效"FTA的组合,称为适配器增量式持续学习(AI-CL)。在ESC-50、SpeechCommandsV2(SCv2)和Audio-Visual Event(AVE)基准上的实验表明,所提方法在保持较低计算预算的同时,有效防止了TI-CL中的灾难性遗忘。