Speech Emotion Recognition (SER) plays a pivotal role in understanding human communication, enabling emotionally intelligent systems, and serving as a fundamental component in the development of Artificial General Intelligence (AGI). However, deploying SER in real-world, spontaneous, and low-resource scenarios remains a significant challenge due to the complexity of emotional expression and the limitations of current speech and language technologies. This thesis investigates the integration of Automatic Speech Recognition (ASR) into SER, with the goal of enhancing the robustness, scalability, and practical applicability of emotion recognition from spoken language.
翻译:语音情感识别(SER)在理解人类交流、实现情感智能系统以及作为人工通用智能(AGI)发展的基础组件方面发挥着关键作用。然而,由于情感表达的复杂性以及当前语音和语言技术的局限性,在现实世界、自发性和资源匮乏的场景中部署SER仍然是一个重大挑战。本文研究了将自动语音识别(ASR)集成到SER中的方法,旨在提高从口语中进行情感识别的鲁棒性、可扩展性和实际适用性。