Speech Emotion Recognition (SER) plays a pivotal role in understanding human communication, enabling emotionally intelligent systems, and serving as a fundamental component in the development of Artificial General Intelligence (AGI). However, deploying SER in real-world, spontaneous, and low-resource scenarios remains a significant challenge due to the complexity of emotional expression and the limitations of current speech and language technologies. This thesis investigates the integration of Automatic Speech Recognition (ASR) into SER, with the goal of enhancing the robustness, scalability, and practical applicability of emotion recognition from spoken language.
翻译:语音情感识别在理解人类交流、实现情感智能系统以及作为人工通用智能发展的基础组件方面发挥着关键作用。然而,由于情感表达的复杂性以及当前语音和语言技术的局限性,在现实世界、自发性和低资源场景中部署语音情感识别仍然是一个重大挑战。本论文研究了将自动语音识别集成到语音情感识别中的方法,旨在增强从口语中进行情感识别的鲁棒性、可扩展性和实际适用性。