This work introduces GazeSCRNN, a novel spiking convolutional recurrent neural network designed for event-based near-eye gaze tracking. Leveraging the high temporal resolution, energy efficiency, and compatibility of Dynamic Vision Sensor (DVS) cameras with event-based systems, GazeSCRNN uses a spiking neural network (SNN) to address the limitations of traditional gaze-tracking systems in capturing dynamic movements. The proposed model processes event streams from DVS cameras using Adaptive Leaky-Integrate-and-Fire (ALIF) neurons and a hybrid architecture optimized for spatio-temporal data. Extensive evaluations on the EV-Eye dataset demonstrate the model's accuracy in predicting gaze vectors. In addition, we conducted ablation studies to reveal the importance of the ALIF neurons, dynamic event framing, and training techniques, such as Forward-Propagation-Through-Time, in enhancing overall system performance. The most accurate model achieved a Mean Angle Error (MAE) of 6.034{\deg} and a Mean Pupil Error (MPE) of 2.094 mm. Consequently, this work is pioneering in demonstrating the feasibility of using SNNs for event-based gaze tracking, while shedding light on critical challenges and opportunities for further improvement.
翻译:本研究提出GazeSCRNN——一种专为事件驱动近眼视线追踪设计的新型脉冲卷积循环神经网络。该模型利用动态视觉传感器(DVS)相机的高时间分辨率、高能效特性及其与事件驱动系统的兼容性,通过脉冲神经网络(SNN)克服传统视线追踪系统在捕捉动态运动方面的局限。所提模型采用自适应漏积分发放(ALIF)神经元及针对时空数据优化的混合架构,处理来自DVS相机的事件流。在EV-Eye数据集上的大量评估表明,该模型在预测视线向量方面具有较高精度。此外,我们通过消融实验揭示了ALIF神经元、动态事件帧构建以及前向传播穿越时间等训练技术对提升系统整体性能的重要性。最优模型实现了6.034°的平均角度误差(MAE)和2.094毫米的平均瞳孔误差(MPE)。本研究首次论证了使用SNN进行事件驱动视线追踪的可行性,同时为后续改进指明了关键挑战与机遇。