Matrix multiplication consumes a large fraction of the time taken in many machine-learning algorithms. Thus, accelerator chips that perform matrix multiplication faster than conventional processors or even GPU's are of increasing interest. In this paper, we demonstrate a method of performing matrix multiplication without a scalar multiplier circuit. In many cases of practical interest, only a single addition and a single on-chip copy operation are needed to replace a multiplication. It thus becomes possible to design a matrix-multiplier chip that, because it does not need time, space- and energy-consuming multiplier circuits, can hold many more processors, and thus provide a net speedup.
翻译:矩阵乘法消耗了大量机器学习算法中的计算时间。因此,相较于传统处理器甚至GPU,能够更快执行矩阵乘法的加速器芯片日益受到关注。本文提出了一种无需标量乘法器电路即可执行矩阵乘法的方法。在许多实际应用场景中,仅需一次加法操作和一次片上复制操作即可替代乘法运算。这使得设计一种矩阵乘法器芯片成为可能——由于无需占用时间、空间和能耗的乘法器电路,这种芯片能够集成更多处理器,从而实现整体性能加速。