In this work, we introduce SeQuiFi, a novel approach for mitigating catastrophic forgetting (CF) in speech emotion recognition (SER). SeQuiFi adopts a sequential class-finetuning strategy, where the model is fine-tuned incrementally on one emotion class at a time, preserving and enhancing retention for each class. While various state-of-the-art (SOTA) methods, such as regularization-based, memory-based, and weight-averaging techniques, have been proposed to address CF, it still remains a challenge, particularly with diverse and multilingual datasets. Through extensive experiments, we demonstrate that SeQuiFi significantly outperforms both vanilla fine-tuning and SOTA continual learning techniques in terms of accuracy and F1 scores on multiple benchmark SER datasets, including CREMA-D, RAVDESS, Emo-DB, MESD, and SHEMO, covering different languages.
翻译:本文提出了一种名为SeQuiFi的新方法,旨在缓解语音情感识别(SER)中的灾难性遗忘(CF)问题。SeQuiFi采用序列化类别微调策略,即模型每次仅在一个情感类别上进行增量式微调,从而保持并增强对每个类别的记忆能力。尽管已有多种最先进(SOTA)方法(如基于正则化、基于记忆和权重平均的技术)被提出以应对CF,但在处理多样化及多语言数据集时,这仍然是一个挑战。通过大量实验,我们证明SeQuiFi在多个基准SER数据集(包括涵盖不同语言的CREMA-D、RAVDESS、Emo-DB、MESD和SHEMO)上的准确率和F1分数均显著优于普通微调方法和SOTA持续学习技术。