Ultra-high-resolution image sensors offer the potential to capture fine spatial details critical for many visual perception tasks, but acquiring and processing all pixels at full resolution is often infeasible under realistic bandwidth, latency, and power constraints. Existing approaches address this challenge through acquisition strategies such as spatial or temporal downsampling, which irrevocably discard information before task relevance can be assessed. In this work, we introduce a real-time, predictive, and task-aware foveated imaging system that operates directly at image acquisition time. Leveraging emerging dual-stream sensor architectures, our method dynamically allocates limited pixel bandwidth to task-relevant regions of interest while maintaining a low-resolution global context. We formulate foveated acquisition as a sensor attention policy-learning problem, in which past observations guide actions that determine future measurements, closing the perception-acquisition loop. Through extensive simulation across multiple perception tasks, we demonstrate that our approach achieves high task performance under strict pixel budgets and significantly outperforms relevant baselines operating at the same bandwidth. We further validate our system on a 200-megapixel dual-stream sensor, capturing real-world videos under realistic bandwidth and latency constraints, demonstrating the practical feasibility of task-driven, acquisition-time foveated imaging.
翻译:超高分辨率图像传感器具备捕捉精细空间细节的潜力,这对许多视觉感知任务至关重要。然而,在现实的带宽、延迟和功耗限制下,获取并处理所有像素的全分辨率数据往往不可行。现有方法通过空间或时间下采样等采集策略来应对这一挑战,但这些策略会在评估任务相关性之前不可逆地丢弃信息。在本工作中,我们提出一种实时、预测性且任务感知的注视成像系统,该系统直接在图像采集阶段运作。利用新兴的双流传感器架构,我们的方法将有限的像素带宽动态分配给任务相关感兴趣区域,同时维持低分辨率的全局上下文。我们将注视采集建模为传感器注意力策略学习问题,其中历史观测引导行动,而行动决定未来测量值,从而闭合感知-采集环路。通过跨多个感知任务的大量仿真,我们证明该方法能在严格像素预算下实现高任务性能,并显著优于在相同带宽下工作的相关基线。我们进一步在200兆像素双流传感器上验证该系统,在真实带宽和延迟约束下采集真实世界视频,展示了任务驱动型采集时序注视成像的实际可行性。