Convolutional Neural Networks (CNNs) are crucial in various applications, but their deployment on resource-constrained edge devices poses challenges. This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic to minimize response time and enhance overall performance. The study proposes a methodology for fusing multiple convolution layers to reduce off-chip memory communication and increase overall performance. An effective mechanism detects and skips inefficient convolutions after ReLU layers, minimizing power consumption without compromising accuracy. Furthermore, efficient tile movement guarantees uniform access to the fusion pyramid. An analysis demonstrates the utile stride strategy improves operational intensity. Two designs cater to varied demands: one focuses on minimal response time for mission-critical applications, and another focuses on resource-constrained devices with comparable latency. This approach notably reduced redundant computations, improving the efficiency of CNN deployment on edge devices.
翻译:卷积神经网络(CNN)在众多应用中至关重要,但其在资源受限的边缘设备上的部署面临挑战。本研究提出了用于卷积的乘积和(SOP)单元,该单元采用低延迟的从左到右位串行算术,以最小化响应时间并提升整体性能。研究提出了一种融合多个卷积层的方法,以减少片外存储器通信并提高整体性能。一种有效的机制可在ReLU层后检测并跳过低效卷积,从而在不影响精度的前提下最小化功耗。此外,高效的瓦片移动保证了融合金字塔的均匀访问。分析表明,有效步长策略提升了运算强度。两种设计满足不同需求:一种专注于关键任务应用的最小响应时间,另一种则专注于具有可比延迟的资源受限设备。该方法显著减少了冗余计算,提升了CNN在边缘设备上的部署效率。