Despite recent advancements in speech emotion recognition (SER) models, state-of-the-art deep learning (DL) approaches face the challenge of the limited availability of annotated data. Large language models (LLMs) have revolutionised our understanding of natural language, introducing emergent properties that broaden comprehension in language, speech, and vision. This paper examines the potential of LLMs to annotate abundant speech data, aiming to enhance the state-of-the-art in SER. We evaluate this capability across various settings using publicly available speech emotion classification datasets. Leveraging ChatGPT, we experimentally demonstrate the promising role of LLMs in speech emotion data annotation. Our evaluation encompasses single-shot and few-shots scenarios, revealing performance variability in SER. Notably, we achieve improved results through data augmentation, incorporating ChatGPT-annotated samples into existing datasets. Our work uncovers new frontiers in speech emotion classification, highlighting the increasing significance of LLMs in this field moving forward.
翻译:尽管语音情感识别(SER)模型近期取得了进展,但最先进的深度学习(DL)方法仍面临标注数据可用性有限的挑战。大型语言模型(LLMs)彻底改变了我们对自然语言的理解,引入了能够拓展语言、语音和视觉理解能力的新兴特性。本文考察了LLMs标注海量语音数据的潜力,旨在提升SER领域的最新水平。我们利用公开的语音情感分类数据集,在不同场景下评估了这种能力。通过采用ChatGPT,我们实验证明了LLMs在语音情感数据标注中的重要作用。我们的评估涵盖了单样本和小样本场景,揭示了SER性能的变异性。值得注意的是,通过数据扩充(将ChatGPT标注的样本纳入现有数据集)我们取得了更优的结果。本研究为语音情感分类开辟了新前沿,凸显了LLMs在这一领域日益重要的地位。