The information-theoretic framework promises to explain the predictive power of neural networks. In particular, the information plane analysis, which measures mutual information (MI) between input and representation as well as representation and output, should give rich insights into the training process. This approach, however, was shown to strongly depend on the choice of estimator of the MI. The problem is amplified for deterministic networks if the MI between input and representation is infinite. Thus, the estimated values are defined by the different approaches for estimation, but do not adequately represent the training process from an information-theoretic perspective. In this work, we show that dropout with continuously distributed noise ensures that MI is finite. We demonstrate in a range of experiments that this enables a meaningful information plane analysis for a class of dropout neural networks that is widely used in practice.
翻译:资讯理论框架有望解释神经网络的预测能力。特别是,资讯平面分析通过测量输入与表征之间以及表征与输出之间的互信息,应为训练过程提供丰富见解。然而,这种方法被发现严重依赖于互信息估计器的选择。对于确定性网络,若输入与表征之间的互信息为无穷大,该问题更为突出。因此,估计值由不同的估算方法决定,但从资讯理论视角看,并不能充分表征训练过程。本研究证明,采用连续分布噪声的丢弃技术可确保互信息为有限值。我们通过一系列实验表明,这能够对实践广泛使用的丢弃神经网络类别实现有意义的资讯平面分析。