This paper builds upon an existing speech emotion recognition model by adding an additional LSTM layer to improve the accuracy and processing efficiency of emotion recognition from audio data. By capturing the long-term dependencies within audio sequences through a dual-layer LSTM network, the model can recognize and classify complex emotional patterns more accurately. Experiments conducted on the RAVDESS dataset validated this approach, showing that the modified dual layer LSTM model improves accuracy by 2% compared to the single-layer LSTM while significantly reducing recognition latency, thereby enhancing real-time performance. These results indicate that the dual-layer LSTM architecture is highly suitable for handling emotional features with long-term dependencies, providing a viable optimization for speech emotion recognition systems. This research provides a reference for practical applications in fields like intelligent customer service, sentiment analysis and human-computer interaction.
翻译:本文在现有语音情感识别模型的基础上,通过增加一个LSTM层来提升从音频数据中识别情感的准确性和处理效率。通过双层LSTM网络捕捉音频序列中的长期依赖关系,该模型能够更准确地识别和分类复杂的情感模式。在RAVDESS数据集上进行的实验验证了该方法,结果表明改进后的双层LSTM模型相比单层LSTM准确率提升了2%,同时显著降低了识别延迟,从而增强了实时性能。这些结果表明,双层LSTM架构非常适用于处理具有长期依赖性的情感特征,为语音情感识别系统提供了一种可行的优化方案。本研究为智能客服、情感分析和人机交互等领域的实际应用提供了参考。