In automatic emotion recognition (AER), labels assigned by different human annotators to the same utterance are often inconsistent due to the inherent complexity of emotion and the subjectivity of perception. Though deterministic labels generated by averaging or voting are often used as the ground truth, it ignores the intrinsic uncertainty revealed by the inconsistent labels. This paper proposes a Bayesian approach, deep evidential emotion regression (DEER), to estimate the uncertainty in emotion attributes. Treating the emotion attribute labels of an utterance as samples drawn from an unknown Gaussian distribution, DEER places an utterance-specific normal-inverse gamma prior over the Gaussian likelihood and predicts its hyper-parameters using a deep neural network model. It enables a joint estimation of emotion attributes along with the aleatoric and epistemic uncertainties. AER experiments on the widely used MSP-Podcast and IEMOCAP datasets showed DEER produced state-of-the-art results for both the mean values and the distribution of emotion attributes.
翻译:在自动情感识别(AER)中,由于情感固有的复杂性和感知的主观性,不同人工标注者对同一语音片段的情感标签往往存在不一致性。尽管通过平均或投票生成的确定性标签常被用作真实标注,但这忽略了不一致标签所揭示的内在不确定性。本文提出一种贝叶斯方法——深度证据情感回归(DEER),用于估计情感属性的不确定性。DEER将某段语音的情感属性标签视为从未知高斯分布中抽取的样本,在该高斯似然函数上施加语音特定的正态逆伽马先验分布,并利用深度神经网络模型预测其超参数。该方法能够联合估计情感属性及其偶然不确定性与认知不确定性。在广泛使用的MSP-Podcast和IEMOCAP数据集上进行的情感识别实验表明,DEER在情感属性的均值与分布估计方面均取得了最先进的性能。