We study the problem of overcoming exponential sample complexity in differential entropy estimation under Gaussian convolutions. Specifically, we consider the estimation of the differential entropy $h(X+Z)$ via $n$ independently and identically distributed samples of $X$, where $X$ and $Z$ are independent $D$-dimensional random variables with $X$ subgaussian with bounded second moment and $Z\sim\mathcal{N}(0,\sigma^2I_D)$. Under the absolute-error loss, the above problem has a parametric estimation rate of $\frac{c^D}{\sqrt{n}}$, which is exponential in data dimension $D$ and often problematic for applications. We overcome this exponential sample complexity by projecting $X$ to a low-dimensional space via principal component analysis (PCA) before the entropy estimation, and show that the asymptotic error overhead vanishes as the unexplained variance of the PCA vanishes. This implies near-optimal performance for inherently low-dimensional structures embedded in high-dimensional spaces, including hidden-layer outputs of deep neural networks (DNN), which can be used to estimate mutual information (MI) in DNNs. We provide numerical results verifying the performance of our PCA approach on Gaussian and spiral data. We also apply our method to analysis of information flow through neural network layers (c.f. information bottleneck), with results measuring mutual information in a noisy fully connected network and a noisy convolutional neural network (CNN) for MNIST classification.
翻译:我们研究了在高斯卷积下克服微分熵估计中指数级样本复杂度的问题。具体而言,我们考虑利用$X$的$n$个独立同分布样本估计微分熵$h(X+Z)$,其中$X$和$Z$为独立的$D$维随机变量,$X$服从有界二阶矩的次高斯分布,$Z\sim\mathcal{N}(0,\sigma^2I_D)$。在绝对误差损失下,该问题的参数估计速率为$\frac{c^D}{\sqrt{n}}$,该速率随数据维度$D$呈指数增长,在实际应用中常引发问题。我们通过主成分分析(PCA)在熵估计前将$X$投影至低维空间,从而克服了这种指数级样本复杂度,并证明随着PCA未解释方差的消失,渐近误差开销趋于零。这表明该方法对嵌入高维空间的固有低维结构(包括深度神经网络(DNN)的隐层输出)具有近最优性能,且可用于估计DNN中的互信息(MI)。我们提供了数值结果,验证了PCA方法在高斯数据和螺旋数据上的表现。并将该方法应用于神经网络层间信息流分析(参见信息瓶颈),通过测量含噪全连接网络和含噪卷积神经网络(CNN)在MNIST分类中的互信息给出实验结果。