Speech emotion recognition (SER) often experiences reduced performance due to background noise. In addition, making a prediction on signals with only background noise could undermine user trust in the system. In this study, we propose a Noise Robust Speech Emotion Recognition system, NRSER. NRSER employs speech enhancement (SE) to effectively reduce the noise in input signals. Then, the signal-to-noise-ratio (SNR)-level detection structure and waveform reconstitution strategy are introduced to reduce the negative impact of SE on speech signals with no or little background noise. Our experimental results show that NRSER can effectively improve the noise robustness of the SER system, including preventing the system from making emotion recognition on signals consisting solely of background noise. Moreover, the proposed SNR-level detection structure can be used individually for tasks such as data selection.
翻译:语音情感识别(SER)常因背景噪声的存在而导致性能下降。此外,对仅含背景噪声的信号进行情感预测会损害用户对系统的信任。本研究提出了一种噪声鲁棒的语音情感识别系统NRSER。该系统采用语音增强(SE)技术有效降低输入信号中的噪声,并引入信噪比(SNR)等级检测结构与波形重构策略,以减少语音增强对无背景噪声或极少背景噪声信号造成的负面影响。实验结果表明,NRSER能有效提升SER系统的噪声鲁棒性,包括避免系统对仅含背景噪声的信号进行情感识别。此外,所提出的信噪比等级检测结构可独立应用于数据选择等任务。