Speech emotion recognition (SER) has traditionally relied on categorical or dimensional labels. However, this technique is limited in representing both the diversity and interpretability of emotions. To overcome this limitation, we focus on color attributes, such as hue, saturation, and value, to represent emotions as continuous and interpretable scores. We annotated an emotional speech corpus with color attributes via crowdsourcing and analyzed them. Moreover, we built regression models for color attributes in SER using machine learning and deep learning, and explored the multitask learning of color attribute regression and emotion classification. As a result, we demonstrated the relationship between color attributes and emotions in speech, and successfully developed color attribute regression models for SER. We also showed that multitask learning improved the performance of each task.
翻译:语音情感识别(SER)传统上依赖于分类或维度标签。然而,这种方法在表示情感的多样性和可解释性方面存在局限。为克服这一局限,我们聚焦于颜色属性(如色调、饱和度和明度),将情感表示为连续且可解释的分数。我们通过众包方式为情感语音语料库标注了颜色属性并进行了分析。此外,我们利用机器学习和深度学习构建了SER中颜色属性的回归模型,并探索了颜色属性回归与情感分类的多任务学习。结果表明,我们揭示了语音中颜色属性与情感之间的关联,并成功开发了适用于SER的颜色属性回归模型。我们还证明多任务学习能够提升各项任务的性能。