The output distribution of a neural network (NN) over the entire input space captures the complete input-output mapping relationship, offering insights toward a more comprehensive NN understanding. Exhaustive enumeration or traditional Monte Carlo methods for the entire input space can exhibit impractical sampling time, especially for high-dimensional inputs. To make such difficult sampling computationally feasible, in this paper, we propose a novel Gradient-based Wang-Landau (GWL) sampler. We first draw the connection between the output distribution of a NN and the density of states (DOS) of a physical system. Then, we renovate the classic sampler for the DOS problem, the Wang-Landau algorithm, by replacing its random proposals with gradient-based Monte Carlo proposals. This way, our GWL sampler investigates the under-explored subsets of the input space much more efficiently. Extensive experiments have verified the accuracy of the output distribution generated by GWL and also showcased several interesting findings - for example, in a binary image classification task, both CNN and ResNet mapped the majority of human unrecognizable images to very negative logit values.
翻译:神经网络在整个输入空间上的输出分布捕获了完整的输入-输出映射关系,为更全面地理解神经网络提供了洞见。对整个输入空间进行穷举枚举或采用传统蒙特卡洛方法可能会产生不切实际的采样时间,尤其是在高维输入情况下。为使此类困难采样在计算上可行,本文提出一种新颖的基于梯度的Wang-Landau(GWL)采样器。我们首先建立神经网络输出分布与物理系统状态密度(DOS)之间的联系,然后通过将经典DOS问题采样器——Wang-Landau算法中的随机提议替换为基于梯度的蒙特卡洛提议,对其加以革新。由此,我们的GWL采样器能更高效地探索输入空间中尚未充分研究的子集。大量实验验证了GWL所生成输出分布的准确性,并展示出若干有趣发现——例如,在二值图像分类任务中,CNN和ResNet将绝大多数人类无法识别的图像映射到极负的logit值。