Lottery ticket hypothesis for deep neural networks emphasizes the importance of initialization used to re-train the sparser networks obtained using the iterative magnitude pruning process. An explanation for why the specific initialization proposed by the lottery ticket hypothesis tends to work better in terms of generalization (and training) performance has been lacking. Moreover, the underlying principles in iterative magnitude pruning, like the pruning of smaller magnitude weights and the role of the iterative process, lack full understanding and explanation. In this work, we attempt to provide insights into these phenomena by empirically studying the volume/geometry and loss landscape characteristics of the solutions obtained at various stages of the iterative magnitude pruning process.
翻译:深度神经网络的彩票假说强调了初始化方法的重要性,该方法用于重新训练通过迭代幅度剪枝过程获得的稀疏网络。对于为何彩票假说所提出的特定初始化方式在泛化(及训练)性能上往往表现更优,目前尚缺乏解释。此外,迭代幅度剪枝的基本原理——例如较小幅度权重的剪除以及迭代过程的作用——尚未得到充分的理解与阐释。本研究通过实证分析迭代幅度剪枝过程各阶段所得解的体积/几何特性与损失景观特征,尝试为这些现象提供新的见解。