Probabilistic representation spaces convey information about a dataset, and to understand the effects of factors such as training loss and network architecture, we seek to compare the information content of such spaces. However, most existing methods to compare representation spaces assume representations are points, and neglect the distributional nature of probabilistic representations. Here, instead of building upon point-based measures of comparison, we build upon classic methods from literature on hard clustering. We generalize two information-theoretic methods of comparing hard clustering assignments to be applicable to general probabilistic representation spaces. We then propose a practical method of estimation that is based on fingerprinting a representation space with a sample of the dataset and is applicable when the communicated information is only a handful of bits. With unsupervised disentanglement as a motivating problem, we find information fragments that are repeatedly contained in individual latent dimensions in VAE and InfoGAN ensembles. Then, by comparing the full latent spaces of models, we find highly consistent information content across datasets, methods, and hyperparameters, even though there is often a point during training with substantial variety across repeat runs. Finally, we leverage the differentiability of the proposed method and perform model fusion by synthesizing the information content of multiple weak learners, each incapable of representing the global structure of a dataset. Across the case studies, the direct comparison of information content provides a natural basis for understanding the processing of information.
翻译:概率表示空间传递着数据集的信息,为理解训练损失和网络架构等因素的影响,我们试图比较此类空间的信息内容。然而,现有大多数比较表示空间的方法假设表示为点,忽略了概率表示的分布特性。本文不基于点式比较度量,而是借鉴硬聚类文献中的经典方法。我们将两种比较硬聚类分配的信息论方法推广至适用于一般概率表示空间。随后提出一种基于用数据样本对表示空间进行"指纹识别"的实用估计方法,该方法适用于所传递信息仅为少量比特的情况。以无监督解耦为驱动问题,我们在VAE和InfoGAN集成中发现了个别潜在维度中反复出现的信息片段。通过比较模型的完整潜在空间,我们发现不同数据集、方法和超参数下的信息内容具有高度一致性,尽管在训练过程中常存在重复运行间差异显著的阶段。最后,我们利用所提方法的可微性,通过融合多个弱学习器的信息内容实现模型融合,其中每个弱学习器均无法表示数据集的全局结构。在系列案例研究中,信息内容的直接比较为理解信息处理提供了自然基础。