An important challenge in machine learning is to predict the initial conditions under which a given neural network will be trainable. We present a method for predicting the trainable regime in parameter space for deep feedforward neural networks (DNNs) based on reconstructing the input from subsequent activation layers via a cascade of single-layer auxiliary networks. We show that a single epoch of training of the shallow cascade networks is sufficient to predict the trainability of the deep feedforward network on a range of datasets (MNIST, CIFAR10, FashionMNIST, and white noise), thereby providing a significant reduction in overall training time. We achieve this by computing the relative entropy between reconstructed images and the original inputs, and show that this probe of information loss is sensitive to the phase behaviour of the network. We further demonstrate that this method generalizes to residual neural networks (ResNets) and convolutional neural networks (CNNs). Moreover, our method illustrates the network's decision making process by displaying the changes performed on the input data at each layer, which we demonstrate for both a DNN trained on MNIST and the vgg16 CNN trained on the ImageNet dataset. Our results provide a technique for significantly accelerating the training of large neural networks.
翻译:机器学习领域的一个重要挑战在于预测给定神经网络在何种初始条件下可被有效训练。本文提出一种基于级联单层辅助网络从后续激活层重构输入的方法,用于预测深度前馈神经网络在参数空间中的可训练区域。我们证明,通过对浅层级联网络进行单轮训练,即可准确预测深度前馈网络在多个数据集(MNIST、CIFAR10、FashionMNIST及白噪声)上的可训练性,从而显著降低整体训练时间。该方法通过计算重构图像与原始输入之间的相对熵,揭示了这种信息损失的探测手段对网络相位行为具有敏感性。我们进一步验证了该方法可推广至残差神经网络和卷积神经网络。此外,本方法通过展示每层对输入数据的修改过程,阐释了网络的决策机制——我们分别在MNIST数据集训练的深度前馈网络和ImageNet数据集训练的vgg16卷积神经网络上验证了这一特性。本研究为大幅加速大型神经网络的训练提供了一种创新技术路径。