The rapid advancement of neural network applications necessitates hardware that not only accelerates computation but also adapts efficiently to dynamic processing requirements. While processing-in-pixel has emerged as a promising solution to overcome the bottlenecks of traditional architectures at the extreme-edge, existing implementations face limitations in reconfigurability and scalability due to their static nature and inefficient area usage. Addressing these challenges, we present a novel architecture that significantly enhances the capabilities of processing-in-pixel for convolutional neural networks. Our design innovatively integrates non-volatile memory (NVM) with novel unit pixel circuit design, enabling dynamic reconfiguration of synaptic weights, kernel size, channel size and stride size. Thus offering unprecedented flexibility and adaptability. With using a separate die for pixel circuit and storing synaptic weights, our circuit achieves a substantial reduction in the required area per pixel thereby increasing the density and scalability of the pixel array. Simulation results demonstrate dot product operations of the circuit, the non-linearity of its analog output and a novel bucket-select curvefit model is proposed to capture it. This work not only addresses the limitations of current in-pixel computing approaches but also opens new avenues for developing more efficient, flexible, and scalable neural network hardware, paving the way for advanced AI applications.
翻译:神经网络应用的快速发展不仅需要能够加速计算的硬件,还需要能够高效适应动态处理需求的硬件。虽然像素内处理已成为克服传统架构在极边缘场景下瓶颈的一种有前景的解决方案,但现有实现由于其静态特性和低效的面积利用,在可重构性和可扩展性方面面临局限。为应对这些挑战,我们提出了一种新颖的架构,显著增强了卷积神经网络像素内处理的能力。我们的设计创新性地将非易失性存储器与新颖的单元像素电路设计相结合,实现了突触权重、卷积核大小、通道大小和步长大小的动态重配置,从而提供了前所未有的灵活性和适应性。通过使用独立的裸片实现像素电路并存储突触权重,我们的电路显著降低了每个像素所需的面积,从而提高了像素阵列的密度和可扩展性。仿真结果展示了电路的点积运算、其模拟输出的非线性特性,并提出了一种新颖的桶选择曲线拟合模型来捕捉该非线性。这项工作不仅解决了当前像素内计算方法的局限性,还为开发更高效、灵活和可扩展的神经网络硬件开辟了新途径,为先进的人工智能应用铺平了道路。