SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech

The interpretation of human voices holds importance across various applications. This study ventures into predicting age, gender, and emotion from vocal cues, a field with vast applications. Voice analysis tech advancements span domains, from improving customer interactions to enhancing healthcare and retail experiences. Discerning emotions aids mental health, while age and gender detection are vital in various contexts. Exploring deep learning models for these predictions involves comparing single, multi-output, and sequential models highlighted in this paper. Sourcing suitable data posed challenges, resulting in the amalgamation of the CREMA-D and EMO-DB datasets. Prior work showed promise in individual predictions, but limited research considered all three variables simultaneously. This paper identifies flaws in an individual model approach and advocates for our novel multi-output learning architecture Speech-based Emotion Gender and Age Analysis (SEGAA) model. The experiments suggest that Multi-output models perform comparably to individual models, efficiently capturing the intricate relationships between variables and speech inputs, all while achieving improved runtime.

翻译：人类语音的解读在各类应用场景中均具有重要意义。本研究致力于通过语音线索预测年龄、性别与情感，这一领域拥有广泛的应用前景。语音分析技术的进步跨越多个领域，从改善客户交互体验到优化医疗保健与零售服务。情感识别有助于心理健康评估，而年龄与性别检测在多种场景中至关重要。本文探讨了用于这些预测任务的深度学习模型，比较了单输出、多输出及序列化模型。在数据获取方面面临挑战，最终整合了CREMA-D与EMO-DB数据集。以往研究在单项预测任务中展现了潜力，但针对三者联合预测的探索有限。本文指出单项模型方法的缺陷，并提出新型多输出学习架构——基于语音的情感、性别与年龄分析（SEGAA）模型。实验表明，多输出模型在保持与单项模型相当性能的同时，能高效捕捉变量与语音输入之间的复杂关联，并显著提升运行效率。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/