Symmetric and sparse tensors arise naturally in many domains including linear algebra, statistics, physics, chemistry, and graph theory. Symmetric tensors are equal to their transposes, so in the $n$-dimensional case we can save up to a factor of $n!$ by avoiding redundant operations. Sparse tensors, on the other hand, are mostly zero, and we can save asymptotically by processing only nonzeros. Unfortunately, specializing for both symmetry and sparsity at the same time is uniquely challenging. Optimizing for symmetry requires consideration of $n!$ transpositions of a triangular kernel, which can be complex and error prone. Considering multiple transposed iteration orders and triangular loop bounds also complicates iteration through intricate sparse tensor formats. Additionally, since each combination of symmetry and sparse tensor formats requires a specialized implementation, this leads to a combinatorial number of cases. A compiler is needed, but existing compilers cannot take advantage of both symmetry and sparsity within the same kernel. In this paper, we describe the first compiler which can automatically generate symmetry-aware code for sparse or structured tensor kernels. We introduce a taxonomy for symmetry in tensor kernels, and show how to target each kind of symmetry. Our implementation demonstrates significant speedups ranging from 1.36x for SSYMV to 30.4x for a 5-dimensional MTTKRP over the non-symmetric state of the art.
翻译:对称稀疏张量在线性代数、统计学、物理学、化学和图论等诸多领域中自然出现。对称张量与其转置相等,因此在$n$维情形下,通过避免冗余操作可节省高达$n!$倍的存储与计算开销。另一方面,稀疏张量中大部分元素为零,通过仅处理非零元素可实现渐进性优化。然而,同时针对对称性与稀疏性进行专门化处理具有独特的挑战性。为对称性优化需考虑三角核的$n!$种转置形式,这过程复杂且易出错。考虑多种转置迭代顺序和三角循环边界也会使复杂稀疏张量格式的遍历过程更加困难。此外,由于每种对称性与稀疏张量格式的组合都需要专门的实现,这导致组合爆炸问题。为此需要编译器支持,但现有编译器无法在同一核函数中同时利用对称性与稀疏性。本文提出首个能自动为稀疏或结构化张量核生成对称感知代码的编译器。我们建立了张量核对称性的分类体系,并阐明如何针对各类对称性生成代码。实验表明,相较于非对称的先进方法,我们的实现取得了显著加速:从SSYMV的1.36倍到五维MTTKRP的30.4倍。