We address the challenges associated with deploying neural networks on CPUs, with a particular focus on minimizing inference time while maintaining accuracy. Our novel approach is to use the dataflow (i.e., computation order) of a neural network to explore data reuse opportunities using heuristic-guided analysis and a code generation framework, which enables exploration of various Single Instruction, Multiple Data (SIMD) implementations to achieve optimized neural network execution. Our results demonstrate that the dataflow that keeps outputs in SIMD registers while also maximizing both input and weight reuse consistently yields the best performance for a wide variety of inference workloads, achieving up to 3x speedup for 8-bit neural networks, and up to 4.8x speedup for binary neural networks, respectively, over the optimized implementations of neural networks today.
翻译:我们针对在CPU上部署神经网络所面临的挑战进行了研究,重点是在保持准确率的同时最小化推理时间。本文提出的创新方法是通过启发式引导分析和代码生成框架,利用神经网络的数据流(即计算顺序)探索数据重用机会,从而支持对各种单指令多数据(SIMD)实现的探索,以实现优化的神经网络执行。实验结果表明,在将输出保留在SIMD寄存器中的同时最大化输入和权重重用的数据流,始终能在多种推理任务上获得最佳性能:相较于当前优化后的神经网络实现,对8位神经网络实现高达3倍的加速,对二值神经网络实现高达4.8倍的加速。