Apache TVM (Tensor Virtual Machine), an open source machine learning compiler framework designed to optimize computations across various hardware platforms, provides an opportunity to improve the performance of dense matrix factorizations such as LU (Lower Upper) decomposition and Cholesky decomposition on GPUs and AI (Artificial Intelligence) accelerators. In this paper, we propose a new TVM autotuning framework using Bayesian Optimization and use the TVM tensor expression language to implement linear algebra kernels such as LU, Cholesky, and 3mm. We use these scientific computation kernels to evaluate the effectiveness of our methods on a GPU cluster, called Swing, at Argonne National Laboratory. We compare the proposed autotuning framework with the TVM autotuning framework AutoTVM with four tuners and find that our framework outperforms AutoTVM in most cases.
翻译:Apache TVM(张量虚拟机)是一个开源机器学习编译器框架,旨在优化跨多种硬件平台的计算,为在GPU和AI(人工智能)加速器上提升稠密矩阵分解(如LU分解和Cholesky分解)的性能提供了契机。本文提出一种基于贝叶斯优化的新型TVM自动调优框架,并利用TVM张量表达式语言实现了LU分解、Cholesky分解及3mm等线性代数核函数。我们以这些科学计算核函数为测试基准,在阿贡国家实验室名为Swing的GPU集群上评估了所提方法的有效性。将所提出的自动调优框架与包含四种调优器的TVM自动调优框架AutoTVM进行对比,结果表明我们的框架在大多数情况下性能优于AutoTVM。