Multimodal emotion recognition from physiological signals is receiving an increasing amount of attention due to the impossibility to control them at will unlike behavioral reactions, thus providing more reliable information. Existing deep learning-based methods still rely on extracted handcrafted features, not taking full advantage of the learning ability of neural networks, and often adopt a single-modality approach, while human emotions are inherently expressed in a multimodal way. In this paper, we propose a hypercomplex multimodal network equipped with a novel fusion module comprising parameterized hypercomplex multiplications. Indeed, by operating in a hypercomplex domain the operations follow algebraic rules which allow to model latent relations among learned feature dimensions for a more effective fusion step. We perform classification of valence and arousal from electroencephalogram (EEG) and peripheral physiological signals, employing the publicly available database MAHNOB-HCI surpassing a multimodal state-of-the-art network. The code of our work is freely available at https://github.com/ispamm/MHyEEG.
翻译:基于生理信号的多模态情绪识别正受到越来越多的关注,因为与行为反应不同,生理信号无法随意控制,从而提供更可靠的信息。现有的深度学习方法仍依赖于手工提取的特征,未能充分利用神经网络的学习能力,且通常采用单一模态方式,而人类情绪本质上是多模态表达的。本文提出了一种配备新型融合模块的超复数多模态网络,该模块包含参数化的超复数乘法。通过在超复数域中操作,运算遵循代数规则,从而能够对学习到的特征维度之间的潜在关系进行建模,实现更有效的融合步骤。我们利用公开数据库MAHNOB-HCI,基于脑电图(EEG)和外围生理信号对效价和唤醒度进行分类,性能超越了多模态最先进网络。本工作代码已开源,可在https://github.com/ispamm/MHyEEG获取。