The Information Bottleneck (IB) principle offers an information-theoretic framework for analyzing the training process of deep neural networks (DNNs). Its essence lies in tracking the dynamics of two mutual information (MI) values: one between the hidden layer and the class label, and the other between the hidden layer and the DNN input. According to the hypothesis put forth by Shwartz-Ziv and Tishby (2017), the training process consists of two distinct phases: fitting and compression. The latter phase is believed to account for the good generalization performance exhibited by DNNs. Due to the challenging nature of estimating MI between high-dimensional random vectors, this hypothesis has only been verified for toy NNs or specific types of NNs, such as quantized NNs and dropout NNs. In this paper, we introduce a comprehensive framework for conducting IB analysis of general NNs. Our approach leverages the stochastic NN method proposed by Goldfeld et al. (2019) and incorporates a compression step to overcome the obstacles associated with high dimensionality. In other words, we estimate the MI between the compressed representations of high-dimensional random vectors. The proposed method is supported by both theoretical and practical justifications. Notably, we demonstrate the accuracy of our estimator through synthetic experiments featuring predefined MI values. Finally, we perform IB analysis on a close-to-real-scale convolutional DNN, which reveals new features of the MI dynamics.
翻译:信息瓶颈(IB)原理为分析深度神经网络(DNN)的训练过程提供了一个信息论框架。其核心在于跟踪两个互信息(MI)值的动态变化:一个是隐藏层与类别标签之间的互信息,另一个是隐藏层与DNN输入之间的互信息。根据Shwartz-Ziv与Tishby(2017)提出的假设,训练过程包含两个不同阶段:拟合与压缩。后者被认为解释了DNN所展现的良好泛化性能。由于高维随机向量间互信息估计的挑战性,该假设仅在玩具神经网络或特定类型神经网络(如量化神经网络和丢弃神经网络)中得到验证。本文引入了一个通用框架,用于对常规神经网络进行IB分析。我们的方法利用了Goldfeld等人(2019)提出的随机神经网络方法,并引入压缩步骤以克服高维障碍。换言之,我们估计高维随机向量压缩表示之间的互信息。所提方法既有理论支撑,也有实践依据。值得注意的是,我们通过预设互信息值的合成实验验证了估计器的准确性。最后,我们对一个接近实际规模的卷积DNN进行了IB分析,揭示了互信息动态的新特征。