Opening the Black Box: predicting the trainability of deep neural networks with reconstruction entropy

An important challenge in machine learning is to predict the initial conditions under which a given neural network will be trainable. We present a method for predicting the trainable regime in parameter space for deep feedforward neural networks (DNNs) based on reconstructing the input from subsequent activation layers via a cascade of single-layer auxiliary networks. We show that a single epoch of training of the shallow cascade networks is sufficient to predict the trainability of the deep feedforward network on a range of datasets (MNIST, CIFAR10, FashionMNIST, and white noise), thereby providing a significant reduction in overall training time. We achieve this by computing the relative entropy between reconstructed images and the original inputs, and show that this probe of information loss is sensitive to the phase behaviour of the network. We further demonstrate that this method generalizes to residual neural networks (ResNets) and convolutional neural networks (CNNs). Moreover, our method illustrates the network's decision making process by displaying the changes performed on the input data at each layer, which we demonstrate for both a DNN trained on MNIST and the vgg16 CNN trained on the ImageNet dataset. Our results provide a technique for significantly accelerating the training of large neural networks.

翻译：机器学习领域的一个重要挑战在于预测给定神经网络在何种初始条件下可被有效训练。本文提出一种基于级联单层辅助网络从后续激活层重构输入的方法，用于预测深度前馈神经网络在参数空间中的可训练区域。我们证明，通过对浅层级联网络进行单轮训练，即可准确预测深度前馈网络在多个数据集（MNIST、CIFAR10、FashionMNIST及白噪声）上的可训练性，从而显著降低整体训练时间。该方法通过计算重构图像与原始输入之间的相对熵，揭示了这种信息损失的探测手段对网络相位行为具有敏感性。我们进一步验证了该方法可推广至残差神经网络和卷积神经网络。此外，本方法通过展示每层对输入数据的修改过程，阐释了网络的决策机制——我们分别在MNIST数据集训练的深度前馈网络和ImageNet数据集训练的vgg16卷积神经网络上验证了这一特性。本研究为大幅加速大型神经网络的训练提供了一种创新技术路径。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Nat. Biotechnol. | 机器学习为生物库驱动的药物发现提供动力

专知会员服务

11+阅读 · 2022年9月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日