Speech Emotion Recognition systems often use static features like Mel-Frequency Cepstral Coefficients (MFCCs), Zero Crossing Rate (ZCR), and Root Mean Square Energy (RMSE). Because of this, they can misclassify emotions when there is acoustic noise in vocal signals. To address this, we added dynamic features using Dynamic Spectral features (Deltas and Delta-Deltas) along with the Kalman Smoothing algorithm. This approach reduces noise and improves emotion classification. Since emotion changes over time, the Kalman Smoothing filter also helped make the classifier outputs more stable. Tests on the RAVDESS dataset showed that this method achieved a state-of-the-art accuracy of 87\% and reduced misclassification between emotions with similar acoustic features
翻译:语音情感识别系统通常采用静态特征,如梅尔频率倒谱系数(MFCCs)、过零率(ZCR)与均方根能量(RMSE)。因此,当语音信号中存在声学噪声时,此类系统可能对情感产生误判。为解决此问题,我们在静态特征基础上引入了动态谱特征(一阶差分与二阶差分),并结合卡尔曼平滑算法。该方法能有效抑制噪声并提升情感分类性能。鉴于情感状态随时间演变,卡尔曼平滑滤波器亦有助于使分类器输出更为稳定。在RAVDESS数据集上的测试表明,该方法取得了87%的当前最优准确率,并显著降低了声学特征相似情感类别间的误分类率。