Despite recent advancements in speech emotion recognition (SER) models, state-of-the-art deep learning (DL) approaches face the challenge of the limited availability of annotated data. Large language models (LLMs) have revolutionised our understanding of natural language, introducing emergent properties that broaden comprehension in language, speech, and vision. This paper examines the potential of LLMs to annotate abundant speech data, aiming to enhance the state-of-the-art in SER. We evaluate this capability across various settings using publicly available speech emotion classification datasets. Leveraging ChatGPT, we experimentally demonstrate the promising role of LLMs in speech emotion data annotation. Our evaluation encompasses single-shot and few-shots scenarios, revealing performance variability in SER. Notably, we achieve improved results through data augmentation, incorporating ChatGPT-annotated samples into existing datasets. Our work uncovers new frontiers in speech emotion classification, highlighting the increasing significance of LLMs in this field moving forward.
翻译:尽管语音情感识别模型近期取得了进展,但最先进的深度学习方法仍面临标注数据有限的挑战。大型语言模型革新了我们对自然语言的理解,其涌现特性拓宽了在语言、语音及视觉领域的认知边界。本文探讨了利用大型语言模型标注海量语音数据的潜力,旨在推动语音情感识别技术的发展。我们基于公开的语音情感分类数据集,在不同设置下评估了这种能力。通过ChatGPT的实验验证,我们证明了大型语言模型在语音情感数据标注中的重要作用。评估涵盖单样本和少样本场景,揭示了语音情感识别性能的波动性。值得注意的是,通过将ChatGPT标注样本纳入现有数据集进行数据增强,我们获得了更优的结果。本研究开拓了语音情感分类的新前沿,凸显了大型语言模型在该领域日益增长的重要性。