With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $\tau$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.
翻译:随着神经网络的日益普及,高效部署推理工作负载的需求日益凸显。神经网络推理工作负载可表示为计算图,其中节点作为处理多维张量的算子。张量可通过组合数量巨大的方式进行转置和/或分块,其中某些配置能加速推理过程。本文提出TGraph——一种神经图架构,能够对目标计算图的快速配置进行筛选,从而构建一种与传统基于启发式方法的编译器相对的人工智能(AI)张量编译器。该方案将TpuGraphs布局集合上的平均肯德尔$\tau$系数从可靠基线的29.8%提升至TGraph的67.4%。我们估算本研究可能带来的二氧化碳减排潜力,相当于AI数据中心所在区域家庭总排放量的50%以上。