In this work, we investigate the existence and effect of percolation in training deep Neural Networks (NNs) with dropout. Dropout methods are regularisation techniques for training NNs, first introduced by G. Hinton et al. (2012). These methods temporarily remove connections in the NN, randomly at each stage of training, and update the remaining subnetwork with Stochastic Gradient Descent (SGD). The process of removing connections from a network at random is similar to percolation, a paradigm model of statistical physics. If dropout were to remove enough connections such that there is no path between the input and output of the NN, then the NN could not make predictions informed by the data. We study new percolation models that mimic dropout in NNs and characterise the relationship between network topology and this path problem. The theory shows the existence of a percolative effect in dropout. We also show that this percolative effect can cause a breakdown when training NNs without biases with dropout; and we argue heuristically that this breakdown extends to NNs with biases.
翻译:本工作探究了在深度神经网络训练过程中使用丢弃方法时渗流现象的存在性及其影响。丢弃方法是训练神经网络的正则化技术,由G. Hinton等人于2012年首次提出。该方法在训练各阶段随机临时移除神经网络中的连接,并利用随机梯度下降算法更新剩余子网络。从网络中随机移除连接的过程类似于统计物理中的标准模型——渗流。若丢弃方法移除的连接足够多,导致神经网络输入与输出之间路径断裂,则神经网络将无法基于数据做出预测。我们研究了模拟神经网络丢弃过程的新型渗流模型,刻画了网络拓扑结构与路径问题之间的关联。理论表明丢弃过程中存在渗流效应;同时证明对于无偏置项的神经网络,该渗流效应可在训练中引发崩溃现象;并通过启发式论证指出,该崩溃现象可延伸至含偏置项的神经网络。