Recent advancements in sensors have led to high resolution and high data throughput at the pixel level. Simultaneously, the adoption of increasingly large (deep) neural networks (NNs) has lead to significant progress in computer vision. Currently, visual intelligence comes at increasingly high computational complexity, energy, and latency. We study a data-driven system that combines dynamic sensing at the pixel level with computer vision analytics at the video level and propose a feedback control loop to minimize data movement between the sensor front-end and computational back-end without compromising detection and tracking precision. Our contributions are threefold: (1) We introduce anticipatory attention and show that it leads to high precision prediction with sparse activation of pixels; (2) Leveraging the feedback control, we show that the dimensionality of learned feature vectors can be significantly reduced with increased sparsity; and (3) We emulate analog design choices (such as varying RGB or Bayer pixel format and analog noise) and study their impact on the key metrics of the data-driven system. Comparative analysis with traditional pixel and deep learning models shows significant performance enhancements. Our system achieves a 10X reduction in bandwidth and a 15-30X improvement in Energy-Delay Product (EDP) when activating only 30% of pixels, with a minor reduction in object detection and tracking precision. Based on analog emulation, our system can achieve a throughput of 205 megapixels/sec (MP/s) with a power consumption of only 110 mW per MP, i.e., a theoretical improvement of ~30X in EDP.
翻译:近年来传感器技术的进步使得像素级分辨率与数据吞吐量显著提升。与此同时,大规模(深度)神经网络(NNs)的广泛应用推动了计算机视觉领域的重大进展。然而,当前视觉智能的实现往往伴随着极高的计算复杂度、能耗与延迟。本研究提出一种数据驱动系统,将像素级的动态感知与视频级的计算机视觉分析相结合,并引入反馈控制环路以最小化传感器前端与计算后端之间的数据迁移,同时保持检测与跟踪精度。我们的贡献主要体现在三个方面:(1)提出前瞻性注意力机制,证明其能通过稀疏激活像素实现高精度预测;(2)利用反馈控制,证明学习特征向量的维度可通过增加稀疏性显著降低;(3)通过模拟模拟电路设计选择(如可变RGB或拜耳像素格式及模拟噪声),研究其对数据驱动系统关键指标的影响。与传统像素处理及深度学习模型的对比分析表明,本系统在性能上取得显著提升:当仅激活30%像素时,系统带宽降低10倍,能量延迟积(EDP)改善15-30倍,而目标检测与跟踪精度仅轻微下降。基于模拟仿真,本系统可实现205兆像素/秒(MP/s)的吞吐量,功耗仅为110 mW/MP,即在EDP指标上获得约30倍的理论提升。