Accelerating programs is typically done by recognizing code idioms matching high-performance libraries or hardware interfaces. However, recognizing such idioms automatically is challenging. The idiom recognition machinery is difficult to write and requires expert knowledge. In addition, slight variations in the input program might hide the idiom and defeat the recognizer. This paper advocates for the use of a minimalist functional array language supporting a small, but expressive, set of operators. The minimalist design leads to a tiny sets of rewrite rules, which encode the language semantics. Crucially, the same minimalist language is also used to encode idioms. This removes the need for hand-crafted analysis passes, or for having to learn a complex domain-specific language to define the idioms. Coupled with equality saturation, this approach is able to match the core functions from the BLAS and PyTorch libraries on a set of computational kernels. Compared to reference C kernel implementations, the approach produces a geometric mean speedup of 1.46x for C programs using BLAS, when generating such programs from the high-level minimalist language.
翻译:程序加速通常通过识别与高性能库或硬件接口匹配的代码习语来实现。然而,自动识别此类习语具有挑战性。习语识别机制难以编写,需要专家知识。此外,输入程序的微小变化可能隐藏习语并导致识别失败。本文提倡使用支持小而富有表达力算子集合的最小功能数组语言。这种最小化设计导致极少的重写规则集,这些规则编码了语言语义。关键在于,同一最小化语言也被用于编码习语。这消除了手工制作分析传递或学习复杂领域特定语言来定义习语的需求。结合等式饱和,该方法能够在计算核集合上匹配BLAS和PyTorch库的核心函数。与参考C核实现相比,当从高级最小化语言生成C程序时,该方法在使用BLAS的C程序上产生了1.46倍的几何平均加速比。