We address the challenges associated with deploying neural networks on CPUs, with a particular focus on minimizing inference time while maintaining accuracy. Our novel approach is to use the dataflow (i.e., computation order) of a neural network to explore data reuse opportunities using heuristic-guided analysis and a code generation framework, which enables exploration of various Single Instruction, Multiple Data (SIMD) implementations to achieve optimized neural network execution. Our results demonstrate that the dataflow that keeps outputs in SIMD registers while also maximizing both input and weight reuse consistently yields the best performance for a wide variety of inference workloads, achieving up to 3x speedup for 8-bit neural networks, and up to 4.8x speedup for binary neural networks, respectively, over the optimized implementations of neural networks today.
翻译:我们解决了在CPU上部署神经网络所面临的挑战,特别关注在保持精度的同时最小化推理时间。我们的创新方法是通过启发式引导分析和代码生成框架,利用神经网络的数据流(即计算顺序)来探索数据重用机会,从而实现对多种单指令多数据(SIMD)实现的探索,以达成优化的神经网络执行。实验结果表明,将输出保持在SIMD寄存器中,同时最大化输入和权重重用的数据流,始终能在一系列推理任务中取得最佳性能:对于8位神经网络可实现高达3倍的加速比,对于二值神经网络则能达到高达4.8倍的加速比,均优于当前神经网络的优化实现。