We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.
翻译:我们开发了信息几何技术,用于分析深度网络在训练过程中的预测轨迹。通过研究底层高维概率模型,我们揭示了训练过程实际上是在探索一个低维流形。具有广泛架构、规模、采用不同优化方法、正则化技术、数据增强技术以及权重初始化方式的网络,在预测空间中均位于同一流形上。我们对该流形的细节进行了研究,发现不同架构的网络遵循可区分的轨迹,但其他因素的影响极小;较大网络沿与较小网络相似的流形进行训练,只是速度更快;而在预测空间截然不同区域初始化的网络,也会沿相似的流形收敛至解。