We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.
翻译:我们开发了信息几何技术来分析深度网络在训练过程中预测结果的轨迹。通过考察底层高维概率模型,我们发现训练过程探索了一个有效的低维流形。在预测空间中,具有广泛架构、尺寸的网络,使用不同优化方法、正则化技术、数据增强技术及权重初始化进行训练,均位于同一流形上。我们研究了该流形的细节,发现不同架构的网络遵循可区分的轨迹,而其他因素影响甚微;较大型网络沿与较小网络相似的流形进行训练,只是速度更快;在预测空间截然不同部分初始化的网络,沿相似流形收敛至解。