Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy

Entropy and mutual information in neural networks provide rich information on the learning process, but they have proven difficult to compute reliably in high dimensions. Indeed, in noisy and high-dimensional data, traditional estimates in ambient dimensions approach a fixed entropy and are prohibitively hard to compute. To address these issues, we leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. Specifically, we define diffusion spectral entropy (DSE) in neural representations of a dataset as well as diffusion spectral mutual information (DSMI) between different variables representing data. First, we show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data that outperform classic Shannon entropy, nonparametric estimation, and mutual information neural estimation (MINE). We then study the evolution of representations in classification networks with supervised learning, self-supervision, or overfitting. We observe that (1) DSE of neural representations increases during training; (2) DSMI with the class label increases during generalizable learning but stays stagnant during overfitting; (3) DSMI with the input signal shows differing trends: on MNIST it increases, while on CIFAR-10 and STL-10 it decreases. Finally, we show that DSE can be used to guide better network initialization and that DSMI can be used to predict downstream classification accuracy across 962 models on ImageNet. The official implementation is available at https://github.com/ChenLiu-1996/DiffusionSpectralEntropy.

翻译：神经网络中的熵与互信息为学习过程提供了丰富信息，但在高维空间中难以可靠计算。对于含噪高维数据，传统在环境维度上的估计会趋近于固定熵值，且计算成本极高。为解决这些问题，我们利用数据几何结构来访问潜在流形，从而可靠地计算这些信息论指标。具体而言，我们定义了数据在神经网络表征中的扩散谱熵（DSE），以及表征不同数据变量之间的扩散谱互信息（DSMI）。首先，我们证明这两种指标在高维模拟数据中能有效抵抗噪声，衡量内在维度与关系强度，其性能优于经典香农熵、非参数估计和互信息神经估计（MINE）。随后，我们研究了分类网络在监督学习、自监督学习或过拟合过程中表征的演化规律。我们观察到：（1）训练过程中神经表征的DSE持续上升；（2）与类别标签的DSMI在泛化学习中上升，而在过拟合时保持停滞；（3）与输入信号的DSMI呈现不同趋势：在MNIST上上升，在CIFAR-10和STL-10上下降。最后，我们证明DSE可用于指导更优的网络初始化，而DSMI可用于预测ImageNet上962个模型的下游分类准确率。官方实现代码见 https://github.com/ChenLiu-1996/DiffusionSpectralEntropy。