Eye tracking has become an essential human-machine interaction modality for providing immersive experience in numerous virtual and augmented reality (VR/AR) applications desiring high throughput (e.g., 240 FPS), small-form, and enhanced visual privacy. However, existing eye tracking systems are still limited by their: (1) large form-factor largely due to the adopted bulky lens-based cameras; and (2) high communication cost required between the camera and backend processor, thus prohibiting their more extensive applications. To this end, we propose a lensless FlatCam-based eye tracking algorithm and accelerator co-design framework dubbed EyeCoD to enable eye tracking systems with a much reduced form-factor and boosted system efficiency without sacrificing the tracking accuracy, paving the way for next-generation eye tracking solutions. On the system level, we advocate the use of lensless FlatCams to facilitate the small form-factor need in mobile eye tracking systems. On the algorithm level, EyeCoD integrates a predict-then-focus pipeline that first predicts the region-of-interest (ROI) via segmentation and then only focuses on the ROI parts to estimate gaze directions, greatly reducing redundant computations and data movements. On the hardware level, we further develop a dedicated accelerator that (1) integrates a novel workload orchestration between the aforementioned segmentation and gaze estimation models, (2) leverages intra-channel reuse opportunities for depth-wise layers, and (3) utilizes input feature-wise partition to save activation memory size. On-silicon measurement validates that our EyeCoD consistently reduces both the communication and computation costs, leading to an overall system speedup of 10.95x, 3.21x, and 12.85x over CPUs, GPUs, and a prior-art eye tracking processor called CIS-GEP, respectively, while maintaining the tracking accuracy.
翻译:眼动追踪已成为众多虚拟现实和增强现实(VR/AR)应用中提供沉浸式体验的关键人机交互方式,这些应用要求高吞吐量(例如每秒240帧)、小体积和增强的视觉隐私。然而,现有眼动追踪系统仍受限于以下因素:(1)因采用基于透镜的相机而导致的较大体积;(2)相机与后端处理器之间所需的高通信成本,从而限制了其更广泛的应用。为此,我们提出了一种名为EyeCoD的无透镜FlatCam眼动追踪算法与加速器协同设计框架,该框架可在不牺牲追踪精度的前提下,显著减小眼动追踪系统的体积并提升系统效率,为下一代眼动追踪解决方案铺平道路。在系统层面,我们倡导使用无透镜FlatCam以满足移动眼动追踪系统对小体积的需求。在算法层面,EyeCoD集成了一个“先预测后聚焦”流程,首先通过分割预测感兴趣区域(ROI),然后仅聚焦于ROI部分来估计注视方向,从而大幅减少冗余计算和数据移动。在硬件层面,我们进一步开发了一个专用加速器,该加速器(1)整合了上述分割模型和注视估计模型之间的新颖工作负载编排,(2)利用了深度可分离层内部通道重用机会,(3)通过输入特征分区来减少激活内存占用。芯片实测验证表明,我们的EyeCoD在保持追踪精度的同时,一致降低了通信和计算成本,使得整体系统速度相比CPU、GPU以及现有眼动追踪处理器CIS-GEP分别提升了10.95倍、3.21倍和12.85倍。