We introduce a wearable single-eye emotion recognition device and a real-time approach to recognizing emotions from partial observations of an emotion that is robust to changes in lighting conditions. At the heart of our method is a bio-inspired event-based camera setup and a newly designed lightweight Spiking Eye Emotion Network (SEEN). Compared to conventional cameras, event-based cameras offer a higher dynamic range (up to 140 dB vs. 80 dB) and a higher temporal resolution. Thus, the captured events can encode rich temporal cues under challenging lighting conditions. However, these events lack texture information, posing problems in decoding temporal information effectively. SEEN tackles this issue from two different perspectives. First, we adopt convolutional spiking layers to take advantage of the spiking neural network's ability to decode pertinent temporal information. Second, SEEN learns to extract essential spatial cues from corresponding intensity frames and leverages a novel weight-copy scheme to convey spatial attention to the convolutional spiking layers during training and inference. We extensively validate and demonstrate the effectiveness of our approach on a specially collected Single-eye Event-based Emotion (SEE) dataset. To the best of our knowledge, our method is the first eye-based emotion recognition method that leverages event-based cameras and spiking neural network.
翻译:我们提出了一种可穿戴式单眼情绪识别设备,以及一种能够从部分情绪观测中识别情绪、且对光照变化具有鲁棒性的实时方法。该方法的核心是生物启发的事件相机设置和新设计的轻量级尖峰眼情绪网络(SEEN)。与传统相机相比,事件相机具有更高的动态范围(最高140 dB vs. 80 dB)和更高时间分辨率,因此在恶劣光照条件下捕获的事件能编码丰富的时间线索。然而,这些事件缺乏纹理信息,这给有效解码时间信息带来了挑战。SEEN从两个不同角度解决这一问题:首先,我们采用卷积尖峰层,利用尖峰神经网络解码相关时间信息的能力;其次,SEEN学习从对应的强度帧中提取关键空间线索,并利用一种新颖的权重复制方案,在训练和推理过程中将空间注意力传递给卷积尖峰层。我们通过专门收集的单眼事件情绪(SEE)数据集广泛验证并展示了该方法的效果。据我们所知,本方法是首个利用事件相机和尖峰神经网络进行基于眼部情绪识别的方法。