In this paper, we propose a new unsupervised domain adaptation (DA) method called layer-adapted implicit distribution alignment networks (LIDAN) to address the challenge of cross-corpus speech emotion recognition (SER). LIDAN extends our previous ICASSP work, deep implicit distribution alignment networks (DIDAN), whose key contribution lies in the introduction of a novel regularization term called implicit distribution alignment (IDA). This term allows DIDAN trained on source (training) speech samples to remain applicable to predicting emotion labels for target (testing) speech samples, regardless of corpus variance in cross-corpus SER. To further enhance this method, we extend IDA to layer-adapted IDA (LIDA), resulting in LIDAN. This layer-adpated extention consists of three modified IDA terms that consider emotion labels at different levels of granularity. These terms are strategically arranged within different fully connected layers in LIDAN, aligning with the increasing emotion-discriminative abilities with respect to the layer depth. This arrangement enables LIDAN to more effectively learn emotion-discriminative and corpus-invariant features for SER across various corpora compared to DIDAN. It is also worthy to mention that unlike most existing methods that rely on estimating statistical moments to describe pre-assumed explicit distributions, both IDA and LIDA take a different approach. They utilize an idea of target sample reconstruction to directly bridge the feature distribution gap without making assumptions about their distribution type. As a result, DIDAN and LIDAN can be viewed as implicit cross-corpus SER methods. To evaluate LIDAN, we conducted extensive cross-corpus SER experiments on EmoDB, eNTERFACE, and CASIA corpora. The experimental results demonstrate that LIDAN surpasses recent state-of-the-art explicit unsupervised DA methods in tackling cross-corpus SER tasks.
翻译:本文提出了一种新的无监督域适应方法——层适配隐式分布对齐网络(LIDAN),以解决跨语料库语音情感识别(SER)中的挑战。LIDAN是我们先前ICASSP工作——深度隐式分布对齐网络(DIDAN)的扩展,其关键贡献在于引入了一种名为隐式分布对齐(IDA)的新型正则化项。该正则化项使得在源域(训练)语音样本上训练的DIDAN模型仍能适用于预测目标域(测试)语音样本的情感标签,从而有效应对跨语料库SER中的语料库差异。为进一步增强该方法,我们将IDA扩展为层适配IDA(LIDA),进而形成LIDAN。这一层适配扩展包含三个改进的IDA项,这些项考虑了不同粒度层次的情感标签。它们被策略性地布置在LIDAN的不同全连接层中,与随层深增加而增强的情感判别能力保持一致。与DIDAN相比,这种设计使LIDAN能够更有效地学习适用于跨不同语料库SER的情感判别性且语料库不变的特征。值得指出的是,与大多数依赖统计矩估计来描述预设显式分布的现有方法不同,IDA和LIDA采用了一种不同的思路。它们利用目标样本重构的思想直接弥合特征分布差距,而无需对分布类型做任何假设。因此,DIDAN和LIDAN可被视为隐式跨语料库SER方法。为评估LIDAN,我们在EmoDB、eNTERFACE和CASIA语料库上开展了广泛的跨语料库SER实验。实验结果表明,在处理跨语料库SER任务时,LIDAN超越了当前最先进的显式无监督域适应方法。