In this paper, we address the intricate challenge of gaze vector prediction, a pivotal task with applications ranging from human-computer interaction to driver monitoring systems. Our innovative approach is designed for the demanding setting of extremely low-light conditions, leveraging a novel temporal event encoding scheme, and a dedicated neural network architecture. The temporal encoding method seamlessly integrates Dynamic Vision Sensor (DVS) events with grayscale guide frames, generating consecutively encoded images for input into our neural network. This unique solution not only captures diverse gaze responses from participants within the active age group but also introduces a curated dataset tailored for low-light conditions. The encoded temporal frames paired with our network showcase impressive spatial localization and reliable gaze direction in their predictions. Achieving a remarkable 100-pixel accuracy of 100%, our research underscores the potency of our neural network to work with temporally consecutive encoded images for precise gaze vector predictions in challenging low-light videos, contributing to the advancement of gaze prediction technologies.
翻译:本文针对注视向量预测这一关键任务展开研究,该任务在从人机交互到驾驶员监控系统等领域具有重要应用价值。我们提出了一种创新方法,专为极端弱光环境设计,融合了新型时间事件编码方案与专用神经网络架构。该时间编码方法可将动态视觉传感器事件与灰度参考帧无缝集成,生成连续编码图像作为神经网络输入。这一独特方案不仅能够捕捉活跃年龄组参与者的多样化注视响应,还构建了针对弱光环境的定制化数据集。经过编码的时间帧与我们的网络相结合,在预测中展现出卓越的空间定位能力与可靠的注视方向判断。研究实现100%的100像素精度,充分验证了神经网络处理时序连续编码图像以在弱光视频中精确预测注视向量的能力,有力推动了注视预测技术的发展。