The high volume of data transmission between the edge sensor and the cloud processor leads to energy and throughput bottlenecks for resource-constrained edge devices focused on computer vision. Hence, researchers are investigating different approaches (e.g., near-sensor processing, in-sensor processing, in-pixel processing) by executing computations closer to the sensor to reduce the transmission bandwidth. Specifically, in-pixel processing for neuromorphic vision sensors (e.g., dynamic vision sensors (DVS)) involves incorporating asynchronous multiply-accumulate (MAC) operations within the pixel array, resulting in improved energy efficiency. In a CMOS implementation, low overhead energy-efficient analog MAC accumulates charges on a passive capacitor; however, the capacitor's limited charge retention time affects the algorithmic integration time choices, impacting the algorithmic accuracy, bandwidth, energy, and training efficiency. Consequently, this results in a design trade-off on the hardware aspect-creating a need for a low-leakage compute unit while maintaining the area and energy benefits. In this work, we present a holistic analysis of the hardware-algorithm co-design trade-off based on the limited integration time posed by the hardware and techniques to improve the leakage performance of the in-pixel analog MAC operations.
翻译:边缘传感器与云端处理器之间的大量数据传输,给专注于计算机视觉的资源受限边缘设备带来了能量和吞吐量瓶颈。因此,研究人员正通过将计算任务移至更靠近传感器的位置(例如近传感器处理、传感器内处理、像素内处理)来探索不同方法,以降低传输带宽。具体而言,针对神经形态视觉传感器(如动态视觉传感器DVS)的像素内处理,涉及在像素阵列中集成异步乘累加(MAC)操作,从而提升能效。在CMOS实现中,低开销、高能效的模拟MAC通过无源电容器积累电荷;然而,电容器有限的电荷保持时间会影响算法积分时间的选择,进而影响算法精度、带宽、能量和训练效率。这导致硬件设计上出现权衡——需要在保持面积和能效优势的同时,设计低泄漏计算单元。本文基于硬件固有的有限积分时间,对硬件-算法协同设计的权衡进行了整体分析,并提出了改善像素内模拟MAC操作泄漏性能的技术方案。