Physiological stress and emotion recognition are important for health monitoring and affective computing. In this work, we present a comprehensive evaluation of deep learning models such as Long Short-Term Memory (LSTM), Temporal Convolutional Networks (TCN), and Transformer on the WESAD dataset for multimodal affect recognition using wrist and chest sensor signals. We perform ablation studies to assess the individual contributions of each modality by training models on wrist-only and chest-only inputs. In addition, we implement a late-fusion ensemble strategy that combines predictions from all three architectures trained on multimodal input. We also employ early fusion at the sensor level by concatenating wrist and chest signals before feeding them into each model. Our results show that Transformer models consistently achieve the highest accuracy in multimodal settings, while TCN models perform best in the wrist-only configuration. The ensemble method yields the highest overall accuracy (98.91 +/- 0.13%) and macro-F1 score (98.56 +/- 0.17%). These findings demonstrate the effectiveness of sensor fusion and ensemble-based fusion in developing robust systems for physiological emotion recognition.
翻译:生理压力与情绪识别对于健康监测与情感计算具有重要意义。本研究针对WESAD数据集,系统评估了长短期记忆网络(LSTM)、时序卷积网络(TCN)及Transformer等深度学习模型在多模态情感识别中的应用,采用腕部与胸部传感器信号进行多模态情感识别。通过消融实验,分别基于仅含腕部信号与仅含胸部信号的训练数据评估各模态的独立贡献。此外,我们实现了一种后期融合集成策略,该策略融合基于多模态输入训练的全部三种架构的预测结果。同时,在传感器层面采用早期融合方法,将腕部与胸部信号拼接后输入各模型。实验结果表明,Transformer模型在多模态设定下持续取得最高准确率,而TCN模型在仅含腕部信号的配置中表现最佳。集成方法实现了最高总体准确率(98.91 ± 0.13%)与宏F1分数(98.56 ± 0.17%)。这些发现证明了传感器融合与基于集成的融合方法在构建稳健的生理情绪识别系统中的有效性。