Dynamic Sparse Training (DST) is a rapidly evolving area of research that seeks to optimize the sparse initialization of a neural network by adapting its topology during training. It has been shown that under specific conditions, DST is able to outperform dense models. The key components of this framework are the pruning and growing criteria, which are repeatedly applied during the training process to adjust the network's sparse connectivity. While the growing criterion's impact on DST performance is relatively well studied, the influence of the pruning criterion remains overlooked. To address this issue, we design and perform an extensive empirical analysis of various pruning criteria to better understand their effect on the dynamics of DST solutions. Surprisingly, we find that most of the studied methods yield similar results. The differences become more significant in the low-density regime, where the best performance is predominantly given by the simplest technique: magnitude-based pruning. The code is provided at https://github.com/alooow/fantastic_weights_paper
翻译:动态稀疏训练(Dynamic Sparse Training, DST)是一个快速发展的研究领域,旨在通过训练过程中调整网络拓扑结构来优化神经网络的稀疏初始化。研究表明,在特定条件下,DST能够超越稠密模型的性能。该框架的核心组件是剪枝与增长准则,这些准则在训练过程中被反复应用,以调整网络的稀疏连接方式。虽然增长准则对DST性能的影响已得到相对充分的研究,但剪枝准则的作用却仍被忽视。为解决这一问题,我们设计并开展了一项针对多种剪枝准则的广泛实证分析,以深入理解其对DST解决方案动态特性的影响。令人惊讶的是,我们发现大多数研究方法产生了相似的结果。差异在低密度区域中变得更为显著,而该区域内最优性能主要由最简单的技术——基于幅值的剪枝方法——所主导。相关代码已开源发布于https://github.com/alooow/fantastic_weights_paper