The vast amount of data generated by camera sensors has prompted the exploration of energy-efficient processing solutions for deploying computer vision tasks on edge devices. Among the various approaches studied, processing-in-pixel integrates massively parallel analog computational capabilities at the extreme-edge, i.e., within the pixel array and exhibits enhanced energy and bandwidth efficiency by generating the output activations of the first neural network layer rather than the raw sensory data. In this article, we propose an energy and bandwidth efficient ADC-less processing-in-pixel architecture. This architecture implements an optimized binary activation neural network trained using Hoyer regularizer for high accuracy on complex vision tasks. In addition, we also introduce a global shutter burst memory read scheme utilizing fast and disturb-free read operation leveraging innovative use of nanoscale voltage-controlled magnetic tunnel junctions (VC-MTJs). Moreover, we develop an algorithmic framework incorporating device and circuit constraints (characteristic device switching behavior and circuit non-linearity) based on state-of-the-art fabricated VC-MTJ characteristics and extensive circuit simulations using commercial GlobalFoundries 22nm FDX technology. Finally, we evaluate the proposed system's performance on two complex datasets - CIFAR10 and ImageNet, showing improvements in front-end and communication energy efficiency by 8.2x and 8.5x respectively and reduction in bandwidth by 6x compared to traditional computer vision systems, without any significant drop in the test accuracy.
翻译:相机传感器产生的大量数据促使人们探索能效优化的处理方案,以在边缘设备上部署计算机视觉任务。在研究的各种方法中,像素内处理技术在极边缘(即像素阵列内部)集成了大规模并行模拟计算能力,并通过生成第一层神经网络的输出激活而非原始传感数据,展现出更高的能效和带宽效率。本文提出一种能效与带宽高效的无模数转换器像素内处理架构。该架构实现了一种优化的二值激活神经网络,该网络使用Hoyer正则化器进行训练,在复杂视觉任务上具有高精度。此外,我们还引入了一种全局快门突发存储器读取方案,该方案利用纳米级电压控制磁隧道结(VC-MTJ)的创新应用,实现了快速且无干扰的读取操作。进一步地,我们基于最新制备的VC-MTJ特性和采用商用GlobalFoundries 22nm FDX技术进行的广泛电路仿真,开发了一个包含器件与电路约束(特征器件开关行为和电路非线性)的算法框架。最后,我们在两个复杂数据集(CIFAR10和ImageNet)上评估了所提系统的性能,结果表明:与传统计算机视觉系统相比,前端能效和通信能效分别提升了8.2倍和8.5倍,带宽降低了6倍,且测试精度未出现显著下降。