As deep learning advances, edge devices and lightweight neural networks are becoming more important. To reduce latency in the AI accelerator, it's essential to not only reduce FLOPs but also enhance hardware performance. We proposed an arithmetic intensity balancing convolution (ABConv) to address the issue of the overall intensity being limited by the small weight arithmetic intensity for convolution with a small spatial size. ABConv increased the maximum bound of overall arithmetic intensity and significantly reduced latency, without sacrificing accuracy. We tested the latency and hardware performance of ABConv on the Arm Ethos-U65 NPU in various configurations and used it to replace some of MobileNetV1 and ResNet50 in image classification for CIFAR100.
翻译:随着深度学习的不断发展,边缘设备与轻量化神经网络的重要性日益凸显。为降低AI加速器中的延迟,不仅需要减少浮点运算次数(FLOPs),更需提升硬件性能。针对小空间尺寸卷积中整体算数强度受限于权重算数强度的问题,本文提出算数强度平衡卷积(Arithmetic Intensity Balancing Convolution, ABConv)。该方法在不损失精度的前提下,有效提升了整体算数强度上限,并显著降低了延迟。我们在Arm Ethos-U65 NPU上通过多种配置测试了ABConv的延迟与硬件性能,并将其应用于MobileNetV1和ResNet50中,在CIFAR100图像分类任务上进行了替换验证。