Speech Emotion Recognition (SER) plays a crucial role in advancing human-computer interaction and speech processing capabilities. We introduce a novel deep-learning architecture designed specifically for the functional data model known as the multiple-index functional model. Our key innovation lies in integrating adaptive basis layers and an automated data transformation search within the deep learning framework. Simulations for this new model show good performances. This allows us to extract features tailored for chunk-level SER, based on Mel Frequency Cepstral Coefficients (MFCCs). We demonstrate the effectiveness of our approach on the benchmark IEMOCAP database, achieving good performance compared to existing methods.
翻译:语音情感识别(SER)在推进人机交互与语音处理能力方面发挥着关键作用。我们提出了一种专为多指标函数模型设计的新型深度学习架构。其核心创新在于将自适应基函数层与自动化数据变换搜索整合到深度学习框架中。针对该新模型的仿真实验展现出良好性能。这使得我们能够基于梅尔频率倒谱系数(MFCCs)提取适用于分块级SER的定制化特征。我们在基准IEMOCAP数据库上验证了该方法的有效性,相较于现有方法取得了优异的性能表现。