Compiling Recurrences over Dense and Sparse Arrays

Recurrence equations lie at the heart of many computational paradigms including dynamic programming, graph analysis, and linear solvers. These equations are often expensive to compute and much work has gone into optimizing them for different situations. The set of recurrence implementations is a large design space across the set of all recurrences (e.g., the Viterbi and Floyd-Warshall algorithms), the choice of data structures (e.g., dense and sparse matrices), and the set of different loop orders. Optimized library implementations do not exist for most points in this design space, and developers must therefore often manually implement and optimize recurrences. We present a general framework for compiling recurrence equations into native code corresponding to any valid point in this general design space. In this framework, users specify a system of recurrences, the type of data structures for storing the input and outputs, and a set of scheduling primitives for optimization. A greedy algorithm then takes this specification and lowers it into a native program that respects the dependencies inherent to the recurrence equation. We describe the compiler transformations necessary to lower this high-level specification into native parallel code for either sparse and dense data structures and provide an algorithm for determining whether the recurrence system is solvable with the provided scheduling primitives. We evaluate the performance and correctness of the generated code on various computational tasks from domains including dense and sparse matrix solvers, dynamic programming, graph problems, and sparse tensor algebra. We demonstrate that generated code has competitive performance to handwritten implementations in libraries.

翻译：循环方程是许多计算范式的核心，包括动态规划、图分析和线性求解器。这些方程的计算成本通常很高，因此大量研究致力于针对不同场景优化它们。循环实现的集合是一个庞大的设计空间，涵盖所有循环类型（如维特比算法和弗洛伊德-沃舍尔算法）、数据结构的选择（如稠密矩阵和稀疏矩阵）以及不同的循环顺序。该设计空间中大多数点不存在优化的库实现，因此开发者通常需要手动实现和优化循环。我们提出了一个通用框架，用于将循环方程编译为与该设计空间中任何有效点对应的原生代码。在此框架中，用户指定循环方程组、存储输入和输出的数据结构类型，以及一组用于优化的调度原语。随后，一个贪心算法获取此规范并将其降级为遵循循环方程固有依赖关系的原生程序。我们描述了将这一高层规范降级为稀疏或稠密数据结构的原生并行代码所需的编译器变换，并提供了一种算法，用于判断循环系统能否通过给定的调度原语求解。我们评估了生成的代码在稠密和稀疏矩阵求解器、动态规划、图问题以及稀疏张量代数等领域多种计算任务上的性能和正确性，证明生成的代码在手写库实现中具有竞争性的性能。