Cross-corpus speech emotion recognition (SER) poses a challenge due to feature distribution mismatch, potentially degrading the performance of established SER methods. In this paper, we tackle this challenge by proposing a novel transfer subspace learning method called acoustic knowledgeguided transfer linear regression (AKTLR). Unlike existing approaches, which often overlook domain-specific knowledge related to SER and simply treat cross-corpus SER as a generic transfer learning task, our AKTLR method is built upon a well-designed acoustic knowledge-guided dual sparsity constraint mechanism. This mechanism emphasizes the potential of minimalistic acoustic parameter feature sets to alleviate classifier overadaptation, which is empirically validated acoustic knowledge in SER, enabling superior generalization in cross-corpus SER tasks compared to using large feature sets. Through this mechanism, we extend a simple transfer linear regression model to AKTLR. This extension harnesses its full capability to seek emotiondiscriminative and corpus-invariant features from established acoustic parameter feature sets used for describing speech signals across two scales: contributive acoustic parameter groups and constituent elements within each contributive group. Our proposed method is evaluated through extensive cross-corpus SER experiments on three widely-used speech emotion corpora: EmoDB, eNTERFACE, and CASIA. The results confirm the effectiveness and superior performance of our method, outperforming recent state-of-the-art transfer subspace learning and deep transfer learning-based cross-corpus SER methods. Furthermore, our work provides experimental evidence supporting the feasibility and superiority of incorporating domain-specific knowledge into the transfer learning model to address cross-corpus SER tasks.
翻译:跨语料库语音情感识别面临特征分布不匹配的挑战,这可能导致现有情感识别方法性能下降。本文通过提出一种新颖的迁移子空间学习方法——声学知识引导的迁移线性回归(AKTLR)来应对这一挑战。与现有方法简单地将跨语料库情感识别视为通用迁移学习任务而忽略其领域特定知识不同,我们的AKTLR方法建立在精心设计的声学知识引导的双重稀疏约束机制之上。该机制强调了简约声学参数特征集在缓解分类器过适应方面的潜力——这是经过实证验证的情感识别领域声学知识,相比使用大规模特征集,能够在跨语料库情感识别任务中实现更优的泛化性能。通过这一机制,我们将简单的迁移线性回归模型扩展为AKTLR,充分挖掘其从描述语音信号的双尺度声学参数特征集中寻找情感判别性与语料库不变特征的能力:即贡献性声学参数组及其组内组成元素。我们利用三个广泛使用的语音情感语料库(EmoDB、eNTERFACE和CASIA)进行了大量跨语料库情感识别实验评估。结果表明,我们的方法在有效性和优越性能上均优于近期基于迁移子空间学习和深度迁移学习的跨语料库情感识别方法。此外,本研究为将领域特定知识融入迁移学习模型以解决跨语料库情感识别任务的可行性和优越性提供了实验证据支持。