Exploiting sparsity in deep neural networks (DNNs) has been a promising area to meet the growing computation need of modern DNNs. However, in practice, sparse DNN acceleration still faces a key challenge. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparse hardware support recently, which provides limited flexibility and requires extra model fine-tuning. Moreover, any sparse model fine-tuned for certain structured sparse hardware cannot be accelerated by other structured hardware. To bridge the gap between sparse DNN models and hardware, this paper proposes tensor approximation via structured decomposition (TASD), which leverages the distributive property in linear algebra to turn any sparse tensor into a series of structured sparse tensors. Next, we develop a software framework, TASDER, to accelerate DNNs by searching layer-wise, high-quality structured decomposition for both weight and activation tensors so that they can be accelerated by any systems with structured sparse hardware support. Evaluation results show that, by exploiting prior structured sparse hardware baselines, our method can accelerate off-the-shelf dense and sparse DNNs without fine-tuning and improves energy-delay-product by up to 83% and 74% on average.
翻译:利用深度神经网络(DNN)中的稀疏性一直是满足现代DNN日益增长计算需求的重要方向。然而在实践中,稀疏DNN加速仍面临关键挑战。为降低稀疏加速的开销,硬件设计者近期提出了结构化稀疏硬件支持方案,但该方法灵活性有限且需额外进行模型微调。此外,任何为特定结构化稀疏硬件微调后的稀疏模型都无法在其他结构化硬件上获得加速。为弥合稀疏DNN模型与硬件之间的鸿沟,本文提出基于结构化分解的张量近似方法(TASD),该方法利用线性代数中的分配律将任意稀疏张量转化为一系列结构化稀疏张量。进而我们开发了软件框架TASDER,通过逐层搜索权重张量与激活张量的高质量结构化分解来加速DNN,使这些张量可在任何具备结构化稀疏硬件支持的系统中获得加速。评估结果表明,通过利用现有结构化稀疏硬件基线,我们的方法无需微调即可加速现存的稠密与稀疏DNN,能量延迟积平均提升最高达83%,平均提升74%。