DST methods achieve state-of-the-art results in sparse neural network training, matching the generalization of dense models while enabling sparse training and inference. Although the resulting models are highly sparse and theoretically cheaper to train, achieving speedups with unstructured sparsity on real-world hardware is challenging. In this work we propose a DST method to learn a variant of structured N:M sparsity, the acceleration of which in general is commonly supported in commodity hardware. Furthermore, we motivate with both a theoretical analysis and empirical results, the generalization performance of our specific N:M sparsity (constant fan-in), present a condensed representation with a reduced parameter and memory footprint, and demonstrate reduced inference time compared to dense models with a naive PyTorch CPU implementation of the condensed representation Our source code is available at https://github.com/calgaryml/condensed-sparsity
翻译:动态稀疏训练(DST)方法在稀疏神经网络训练中取得了最先进的结果,其泛化性能可与稠密模型相媲美,同时支持稀疏训练与推理。尽管生成的模型高度稀疏且理论上训练成本更低,但在实际硬件上利用非结构化稀疏性实现加速仍具挑战性。本文提出一种DST方法,用于学习结构化N:M稀疏性的变体,该变体在通用商业硬件上普遍支持加速。此外,我们通过理论分析与实验结果表明,特定N:M稀疏性(恒定扇入)具有优异的泛化性能,提出了一种参数与内存占用更低的紧凑表示形式,并基于该紧凑表示的朴素PyTorch CPU实现,证明其推理时间相比稠密模型有所减少。我们的源代码已开源至https://github.com/calgaryml/condensed-sparsity