Deploying Convolutional Neural Networks (CNNs) on edge platforms necessitates efficient hardware acceleration. Any unnecessary data movement in such accelerators can unacceptably degrade performance and efficiency. To address this, we develop a layer fusion technique targeting CNNs, that reduces off-chip data communication using a Genetic Algorithm (GA) applied to graph-based topological sort. Results show a 1.8$\times$ increase in energy efficiency and 1.9$\times$ improvement in energy-delay product (EDP) for MobileNet-v3 on a SIMBA-like mobile architecture. Our approach consistently improves workload performance, averaging 1.4$\times$ improvement to EDP for SIMBA and 1.12$\times$ for Eyeriss.
翻译:在边缘平台上部署卷积神经网络(CNN)需要高效的硬件加速。此类加速器中任何不必要的数据移动都会不可接受地降低性能和效率。为解决这一问题,我们提出了一种针对CNN的层融合技术,该技术利用基于图的拓扑排序,通过遗传算法(GA)减少片外数据通信。实验结果在类SIMBA移动架构上针对MobileNet-v3实现了1.8倍的能效提升和1.9倍的能耗延迟积(EDP)改善。我们的方法持续提升了工作负载性能,在SIMBA上平均EDP提升1.4倍,在Eyeriss上平均提升1.12倍。