Sparsity is a growing trend in modern DNN models. Existing Sparse-Sparse Matrix Multiplication (SpMSpM) accelerators are tailored to a particular SpMSpM dataflow (i.e., Inner Product, Outer Product or Gustavsons), that determines their overall efficiency. We demonstrate that this static decision inherently results in a suboptimal dynamic solution. This is because different SpMSpM kernels show varying features (i.e., dimensions, sparsity pattern, sparsity degree), which makes each dataflow better suited to different data sets. In this work we present Flexagon, the first SpMSpM reconfigurable accelerator that is capable of performing SpMSpM computation by using the particular dataflow that best matches each case. Flexagon accelerator is based on a novel Merger-Reduction Network (MRN) that unifies the concept of reducing and merging in the same substrate, increasing efficiency. Additionally, Flexagon also includes a 3-tier memory hierarchy, specifically tailored to the different access characteristics of the input and output compressed matrices. Using detailed cycle-level simulation of contemporary DNN models from a variety of application domains, we show that Flexagon achieves average performance benefits of 4.59x, 1.71x, and 1.35x with respect to the state-of-the-art SIGMA-like, Sparch-like and GAMMA-like accelerators (265% , 67% and 18%, respectively, in terms of average performance/area efficiency).
翻译:稀疏性是现代深度神经网络模型日益显著的趋势。现有稀疏-稀疏矩阵乘法加速器针对特定SpMSpM数据流(即内积、外积或Gustavson算法)定制设计,这决定了其整体效率。我们证明这种静态决策本质上会导致次优的动态解决方案。这是因为不同SpMSpM核展现出各异特征(即维度、稀疏模式、稀疏度),使得每种数据流更适合特定的数据集。本文提出Flexagon——首个可重构SpMSpM加速器,能够通过采用最匹配各场景的特定数据流执行SpMSpM计算。Flexagon加速器基于新型合并-归约网络(MRN),将归约与合并概念统一于同一架构中,提升了效率。此外,Flexagon还包含专为输入与输出压缩矩阵不同访问特征定制的三级存储层次。通过对来自多个应用领域的当代DNN模型进行细致周期级仿真,我们证明Flexagon相比当前最先进的SIGMA类、Sparch类和GAMMA类加速器,平均性能提升分别达4.59倍、1.71倍和1.35倍(平均性能/面积效率分别提升265%、67%和18%)。