Sparse matrix operations involve a large number of zero operands which makes most of the operations redundant. The amount of redundancy magnifies when a matrix operation repeatedly executes on sparse data. Optimizing matrix operations for sparsity involves either reorganization of data or reorganization of computations, performed either at compile-time or run-time. Although compile-time techniques avert from introducing run-time overhead, their application either is limited to simple sparse matrix operations generating dense output and handling immutable sparse matrices or requires manual intervention to customize the technique to different matrix operations. We contribute a compile time technique called SpComp that optimizes a sparse matrix operation by automatically customizing its computations to the positions of non-zero values of the data. Our approach neither incurs any run-time overhead nor requires any manual intervention. It is also applicable to complex matrix operations generating sparse output and handling mutable sparse matrices. We introduce a data-flow analysis, named Essential Indices Analysis, that statically collects the symbolic information about the computations and helps the code generator to reorganize the computations. The generated code includes piecewise-regular loops, free from indirect references and amenable to further optimization. We see a substantial performance gain by SpComp-generated SpMSpV code when compared against the state-of-the-art TACO compiler and piecewise-regular code generator. On average, we achieve 79% performance gain against TACO and 83% performance gain against the piecewise-regular code generator. When compared against the CHOLMOD library, SpComp generated sparse Cholesky decomposition code showcases 65% performance gain on average.
翻译:稀疏矩阵操作涉及大量零操作数,这使得大多数计算冗余。当矩阵操作在稀疏数据上重复执行时,冗余量会进一步放大。针对稀疏性优化矩阵操作通常涉及数据重组织或计算重组织,这些操作可在编译时或运行时进行。尽管编译时技术避免了引入运行时开销,但其应用要么局限于生成稠密输出且处理不可变稀疏矩阵的简单稀疏矩阵操作,要么需要人工干预以适应不同的矩阵操作。本文提出了一种名为SpComp的编译时技术,通过自动将计算定制为数据非零值的位置来优化稀疏矩阵操作。该方法既不产生运行时开销,也不需要人工干预,同时适用于生成稀疏输出且处理可变稀疏矩阵的复杂矩阵操作。我们引入了一种名为"必要索引分析"的数据流分析技术,该技术静态收集关于计算的符号信息,并帮助代码生成器重新组织计算。生成的代码包含分段规则循环,避免了间接引用并便于进一步优化。与当前最先进的TACO编译器和分段规则代码生成器相比,SpComp生成的SpMSpV代码实现了显著的性能提升。平均而言,我们相对于TACO获得了79%的性能提升,相对于分段规则代码生成器获得了83%的性能提升。与CHOLMOD库相比,SpComp生成的稀疏Cholesky分解代码平均实现了65%的性能提升。