We consider the problem of automatically decomposing operations over tensors or arrays so that they can be executed in parallel on multiple devices. We address two, closely-linked questions. First, what programming abstraction should systems for tensor-based computing offer to enable such decompositions? Second, given that abstraction, how should such systems automatically decompose a tensor-based computation? We assert that tensor-based systems should offer a programming abstraction based on an extended Einstein summation notation, which is a fully declarative, mathematical specification for tensor computations. We show that any computation specified in the Einstein summation notation can be re-written into an equivalent tensor-relational computation, and this re-write generalizes existing notations of tensor parallelism such as "data parallel'' and "model parallel.'' We consider the algorithmic problem of optimally computing a tensor-relational decomposition of a graph of operations specified in our extended Einstein summation notation, and we experimentally show the value of the algorithm that we develop.
翻译:本文研究如何自动分解张量或数组运算以实现多设备并行执行。我们探讨两个紧密关联的问题:首先,基于张量的计算系统应提供何种编程抽象以支持此类分解?其次,在给定抽象的前提下,此类系统应如何自动分解基于张量的计算?我们主张张量计算系统应提供基于扩展爱因斯坦求和记法的编程抽象,该记法是对张量计算的完全声明式数学规范。我们证明任何用爱因斯坦求和记法描述的计算均可重写为等价的张量关系计算,且这种重写可泛化现有的张量并行表示方法(如"数据并行"与"模型并行")。针对扩展爱因斯坦求和记法所描述的操作图,我们研究了其最优张量关系分解的算法问题,并通过实验验证了所开发算法的价值。