We present a real-time gaze tracking system that directly acquires task-relevant latent features using a fully passive optical encoder. Instead of forming and processing full-resolution images, our approach leverages a microlens array with a co-designed binary chromium mask to perform spatially multiplexed optical encoding, producing a compact set of measurements sufficient for gaze estimation. By integrating sensing and feature extraction in the optical domain, the proposed system eliminates the need for high-bandwidth image readout and substantially reduces computational overhead. The encoded measurements are captured by a 4 x 4 phototransistor array and mapped to gaze direction using a lightweight neural network. Our proof-of-concept prototype enables an end-to-end sensing-to-inference latency of 3.4 ms, outperforming published research systems. We demonstrate the effectiveness of our approach on both simulated and real-world data, achieving competitive gaze estimation accuracy while significantly improving latency and energy efficiency compared to conventional camera-based pipelines. This work highlights the potential of task-driven optical sensing for ultra-low-latency, computationally efficient human-computer interaction systems.
翻译:我们提出了一种实时注视跟踪系统,该系统利用完全无源光学编码器直接获取任务相关的潜特征。与传统方法不同,我们的方法无需生成和处理全分辨率图像,而是采用微透镜阵列与协同设计的二元铬掩膜,通过空间复用光学编码产生一组紧凑的测量值,这些测量值足以用于注视估计。通过将感知与特征提取集成在光学域中,所提出的系统消除了高带宽图像读取的需求,并显著降低了计算开销。编码后的测量值由一个4×4的光电晶体管阵列捕获,并通过轻量级神经网络映射为注视方向。我们的概念验证原型实现了从感知到推理的3.4毫秒端到端延迟,优于已发表的研究系统。我们通过仿真数据和真实数据验证了该方法的有效性,在保持竞争性注视估计精度的同时,显著提升了延迟和能效,优于传统基于摄像头的处理流程。这项工作突显了任务驱动型光学传感在超低延迟、高计算效率人机交互系统中的潜力。