The process of identifying human emotion and affective states from speech is known as speech emotion recognition (SER). This is based on the observation that tone and pitch in the voice frequently convey underlying emotion. Speech recognition includes the ability to recognize emotions, which is becoming increasingly popular and in high demand. With the help of appropriate factors (such modalities, emotions, intensities, repetitions, etc.) found in the data, my research seeks to use the Convolutional Neural Network (CNN) to distinguish emotions from audio recordings and label them in accordance with the range of different emotions. I have developed a machine learning model to identify emotions from supplied audio files with the aid of machine learning methods. The evaluation is mostly focused on precision, recall, and F1 score, which are common machine learning metrics. To properly set up and train the machine learning framework, the main objective is to investigate the influence and cross-relation of all input and output parameters. To improve the ability to recognize intentions, a key condition for communication, I have evaluated emotions using my specialized machine learning algorithm via voice that would address the emotional state from voice with the help of digital healthcare, bridging the gap between human and artificial intelligence (AI).
翻译:从语音中识别人类情感与情感状态的过程称为语音情感识别(SER)。其依据在于,声音中的语调和音高常常传递潜在的情感。语音识别包含识别情感的能力,这一功能正日益普及且需求高涨。本研究旨在利用卷积神经网络(CNN),借助数据中蕴含的适当因素(如模态、情感、强度、重复等),从音频记录中区分情感并依据不同情感范围进行标注。借助机器学习方法,我开发了一个机器学习模型,用于从提供的音频文件中识别情感。评估主要侧重于精确率、召回率和F1分数这些常见的机器学习指标。为正确构建和训练机器学习框架,主要目标是探究所有输入与输出参数的影响及其交叉关联。为提升识别意图的能力——这是沟通的关键条件,我通过语音利用专门的机器学习算法评估情感,该算法将在数字医疗的辅助下解析语音中的情感状态,从而弥合人类与人工智能(AI)之间的鸿沟。