The deployment and training of neural networks on edge computing devices pose many challenges. The low memory nature of edge devices is often one of the biggest limiting factors encountered in the deployment of large neural network models. Tensor rematerialization or recompute is a way to address high memory requirements for neural network training and inference. In this paper we consider the problem of execution time minimization of compute graphs subject to a memory budget. In particular, we develop a new constraint programming formulation called \textsc{Moccasin} with only $O(n)$ integer variables, where $n$ is the number of nodes in the compute graph. This is a significant improvement over the works in the recent literature that propose formulations with $O(n^2)$ Boolean variables. We present numerical studies that show that our approach is up to an order of magnitude faster than recent work especially for large-scale graphs.
翻译:在边缘计算设备上部署和训练神经网络面临诸多挑战。边缘设备内存容量有限通常是部署大型神经网络模型时遇到的最大限制因素之一。张量重新计算是实现神经网络训练和推理中高内存需求的一种解决方法。本文研究了在内存预算约束下计算图执行时间最小化的问题。具体而言,我们提出了一种名为\textsc{Moccasin}的新型约束规划模型,仅包含$O(n)$个整数变量,其中$n$为计算图中的节点数。相较于近期文献中提出的包含$O(n^2)$个布尔变量的模型,这是显著改进。数值实验表明,对于大规模图,本方法的速度比近期相关工作快一个数量级。