Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven demand for deploying them to a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program. It also introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and library calls in a single representation to enable cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on large language models show that Relax delivers performance competitive with state-of-the-art hand-optimized systems across platforms and enables deployment of emerging dynamic models to a broader set of environments, including mobile phones, embedded devices, and web browsers.
翻译:动态形状计算已成为现代机器学习工作负载(特别是新兴的大语言模型)中的关键。这些模型的成功推动了将其部署到多样化后端环境的需求。本文提出了Relax,一种用于优化端到端动态机器学习工作负载的编译器抽象。Relax引入了一阶符号形状注解,以在程序的全局范围内追踪动态形状计算。它还引入了一种跨层级抽象,将计算图、循环级张量程序以及库调用封装在单一表示中,从而支持跨层级优化。我们利用所提出的方法构建了一个端到端编译框架,以优化动态形状模型。在大语言模型上的实验结果表明,Relax在多平台上实现了与最先进手工优化系统相媲美的性能,并能将新兴动态模型部署到更广泛的环境(包括手机、嵌入式设备和网页浏览器)中。