Lottery ticket hypothesis for deep neural networks emphasizes the importance of initialization used to re-train the sparser networks obtained using the iterative magnitude pruning process. An explanation for why the specific initialization proposed by the lottery ticket hypothesis tends to work better in terms of generalization (and training) performance has been lacking. Moreover, the underlying principles in iterative magnitude pruning, like the pruning of smaller magnitude weights and the role of the iterative process, lack full understanding and explanation. In this work, we attempt to provide insights into these phenomena by empirically studying the volume/geometry and loss landscape characteristics of the solutions obtained at various stages of the iterative magnitude pruning process.
翻译:深度神经网络的彩票假说强调了在利用迭代幅度剪枝过程获得的稀疏网络重新训练时初始化的重要性。关于为何彩票假说提出的特定初始化在泛化(及训练)性能上往往更优,此前缺乏解释。此外,迭代幅度剪枝中的基本机制——如较小幅度权重的剪枝以及迭代过程的作用——也缺乏充分理解和说明。本研究通过实证分析迭代幅度剪枝过程中各阶段所获解的几何/体积特性及损失景观特征,试图为这些现象提供深入见解。