Continual learning involves training neural networks incrementally for new tasks while retaining the knowledge of previous tasks. However, efficiently fine-tuning the model for sequential tasks with minimal computational resources remains a challenge. In this paper, we propose Task Incremental Continual Learning (TI-CL) of audio classifiers with both parameter-efficient and compute-efficient Audio Spectrogram Transformers (AST). To reduce the trainable parameters without performance degradation for TI-CL, we compare several Parameter Efficient Transfer (PET) methods and propose AST with Convolutional Adapters for TI-CL, which has less than 5% of trainable parameters of the fully fine-tuned counterparts. To reduce the computational complexity, we introduce a novel Frequency-Time factorized Attention (FTA) method that replaces the traditional self-attention in transformers for audio spectrograms. FTA achieves competitive performance with only a factor of the computations required by Global Self-Attention (GSA). Finally, we formulate our method for TI-CL, called Adapter Incremental Continual Learning (AI-CL), as a combination of the "parameter-efficient" Convolutional Adapter and the "compute-efficient" FTA. Experiments on ESC-50, SpeechCommandsV2 (SCv2), and Audio-Visual Event (AVE) benchmarks show that our proposed method prevents catastrophic forgetting in TI-CL while maintaining a lower computational budget.
翻译:持续学习旨在逐步为神经网络训练新任务的同时保持对先前任务的知识。然而,如何以最小的计算资源高效地对模型进行顺序任务微调仍是一个挑战。本文针对音频分类器提出了一种任务增量持续学习(TI-CL)方法,该方法结合了参数高效和计算高效的音频频谱图Transformer(AST)。为了在TI-CL中减少可训练参数而不影响性能,我们比较了几种参数高效迁移(PET)方法,并提出了基于卷积适配器的AST用于TI-CL,其可训练参数仅为完全微调对应模型的5%以下。为降低计算复杂度,我们引入了一种新颖的频-时因子化注意力(FTA)方法,该方法替代了传统Transformer中对音频频谱图的自注意力机制。FTA仅需全局自注意力(GSA)计算量的一定比例即可实现竞争性性能。最后,我们将TI-CL方法——称为适配器增量持续学习(AI-CL)——定义为“参数高效”的卷积适配器与“计算高效”的FTA的组合。在ESC-50、SpeechCommandsV2(SCv2)和音频-视觉事件(AVE)基准上的实验表明,所提方法在降低计算开销的同时,有效防止了TI-CL中的灾难性遗忘。