As the size of Deep Neural Networks (DNNs) increases dramatically to achieve high accuracy, the DNNs require a large amount of computations and memory footprint. Pruning, which produces a sparse neural network, is one of the solutions to reduce the computational complexity of neural network processing. To maximize the performance of the computations with such compressed data, dedicated sparse neural network accelerators have been introduced, but complex circuits for matching the indices of non-zero inputs/weights cause large overhead in area and power of processing elements (PEs). The sparse PE becomes significantly larger than the dense PE, which raises an interesting question for designers; "Given the area, isn't it better to use larger number of dense PEs despite the low utilization in sparse matrix computations?" In this paper, we show that the answer is "yes", and demonstrate an area and energy-efficient method for sparse neural network computing on dense-matrix multiplication hardware accelerators (Sparse-on-Dense).
翻译:随着深度神经网络为追求高精度而规模急剧增大,其计算量与内存占用也随之激增。剪枝技术通过生成稀疏神经网络,成为降低神经网络计算复杂度的有效方案之一。为最大化压缩数据下的计算性能,专用稀疏神经网络加速器应运而生,但用于匹配非零输入/权重索引的复杂电路会导致处理单元(PE)的面积与功耗开销显著增加。稀疏PE的规模远大于稠密PE,这引发了一个设计者关注的有趣问题:“给定面积约束,尽管稀疏矩阵计算利用率较低,采用更多数量的稠密PE是否反而更优?”本文证明该问题的答案为“是”,并提出一种在稠密矩阵乘法硬件加速器上实现稀疏神经网络计算的高面积高效与能效方法(Sparse-on-Dense)。