We introduce Mirage, the first multi-level superoptimizer for tensor programs. A key idea in Mirage is $\mu$Graphs, a uniform representation of tensor programs at the kernel, thread block, and thread levels of the GPU compute hierarchy. $\mu$Graphs enable Mirage to discover novel optimizations that combine algebraic transformations, schedule transformations, and generation of new custom kernels. To navigate the large search space, Mirage introduces a pruning technique based on abstraction that significantly reduces the search space and provides a certain optimality guarantee. To ensure that the optimized $\mu$Graph is equivalent to the input program, Mirage introduces a probabilistic equivalence verification procedure with strong theoretical guarantees. Our evaluation shows that Mirage outperforms existing approaches by up to 3.5$\times$ even for DNNs that are widely used and heavily optimized. Mirage is publicly available at https://github.com/mirage-project/mirage.
翻译:我们介绍了Mirage——首个针对张量程序的多层级超级优化器。Mirage的核心创新在于提出了一种名为$\mu$Graphs的统一表示方法,能够在GPU计算层级中的内核、线程块和线程三个层面统一描述张量程序。$\mu$Graphs使Mirage能够发现融合代数变换、调度变换和新型自定义内核生成的创新优化方案。为应对庞大的搜索空间,Mirage引入了一种基于抽象化的剪枝技术,该技术显著缩减搜索空间并具有确定性的最优性保证。为确保优化后的$\mu$Graph与输入程序等价,Mirage提出了一种具有强理论保证的概率性等价性验证流程。评估结果表明,即使对于已被广泛使用且高度优化的DNN模型,Mirage的性能仍比现有方法提升高达3.5倍。Mirage已开源发布于https://github.com/mirage-project/mirage。