Deep Neural Networks (DNNs) require highly efficient matrix multiplication engines for complex computations. This paper presents a systolic array architecture incorporating novel exact and approximate processing elements (PEs), designed using energy-efficient positive partial product and negative partial product cells, termed as PPC and NPPC, respectively. The proposed 8-bit exact and approximate PE designs are employed in a 8x8 systolic array, which achieves a energy savings of 22% and 32%, respectively, compared to the existing design. To demonstrate their effectiveness, the proposed PEs are integrated into a systolic array (SA) for Discrete Cosine Transform (DCT) computation, achieving high output quality with a PSNR of 38.21,dB. Furthermore, in an edge detection application using convolution, the approximate PE achieves a PSNR of 30.45,dB. These results highlight the potential of the proposed design to deliver significant energy efficiency while maintaining competitive output quality, making it well-suited for error-resilient image and vision processing applications.
翻译:深度神经网络(DNN)需要高效的矩阵乘法引擎以执行复杂计算。本文提出了一种结合新型精确与近似处理单元(PE)的脉动阵列架构,这些处理单元采用能量高效的正部分积单元与负部分积单元设计,分别称为PPC与NPPC。所提出的8位精确与近似PE设计应用于一个8x8脉动阵列,与现有设计相比,分别实现了22%和32%的能量节省。为验证其有效性,将所提出的PE集成于用于离散余弦变换(DCT)计算的脉动阵列(SA)中,获得了38.21 dB的高输出峰值信噪比(PSNR)。此外,在采用卷积的边缘检测应用中,近似PE实现了30.45 dB的PSNR。这些结果凸显了所提设计在保持有竞争力输出质量的同时,实现显著能量效率的潜力,使其非常适用于容错图像与视觉处理应用。