We present an algorithm for normalizing \emph{Batched Einstein Summation} expressions by mapping mathematically equivalent formulations to a unique normal form. Batches of einsums with the same Einstein notation that exhibit substantial data reuse appear frequently in finite element methods (FEM), numerical linear algebra, and computational chemistry. To effectively exploit this temporal locality for high performance, we consider groups of einsums in batched form. Representations of equivalent batched einsums may differ due to index renaming, permutations within the batch, and, due to the commutativity and associativity of multiplication operation. The lack of a canonical representation hinders the reuse of optimization and tuning knowledge in software systems. To this end, we develop a novel encoding of batched einsums as colored graphs and apply graph canonicalization to derive a normal form. In addition to the canonicalization algorithm, we propose a representation of einsums using functional array operands and provide a strategy to transfer transformations operating on the normal form to \emph{functional batched einsums} that exhibit the same normal form; crucial for fusing surrounding computations for memory bound einsums. We evaluate our approach against JAX, and observe a geomean speedup of $4.7\times$ for einsums from the TCCG benchmark suite and an FEM solver.
翻译:本文提出了一种通过将数学上等价的表达式映射到唯一规范形式来实现\emph{批处理爱因斯坦求和}表达式规范化的算法。在有限元方法、数值线性代数和计算化学中,频繁出现具有相同爱因斯坦记号且展现出显著数据重用的批量einsum表达式。为了有效利用其时域局部性以实现高性能,我们考虑以批处理形式对einsum进行分组。等价的批处理einsum表示可能因索引重命名、批次内置换以及乘法运算的交换律和结合律而有所不同。缺乏规范表示阻碍了软件系统中优化与调优知识的重用。为此,我们提出了一种将批处理einsum编码为着色图的新方法,并应用图规范化技术来推导其规范形式。除了规范化算法,我们还提出了一种使用函数式数组操作数表示einsum的方法,并提供了将作用于规范形式的变换迁移到具有相同规范形式的\emph{函数式批处理einsum}的策略;这对于融合内存受限einsum的周边计算至关重要。我们在JAX框架上评估了所提方法,在TCCG基准测试套件和一个有限元求解器的einsum算例上,观测到平均4.7倍的几何平均加速比。