We introduce an information-theoretic quantity with similar properties to mutual information that can be estimated from data without making explicit assumptions on the underlying distribution. This quantity is based on a recently proposed matrix-based entropy that uses the eigenvalues of a normalized Gram matrix to compute an estimate of the eigenvalues of an uncentered covariance operator in a reproducing kernel Hilbert space. We show that a difference of matrix-based entropies (DiME) is well suited for problems involving the maximization of mutual information between random variables. While many methods for such tasks can lead to trivial solutions, DiME naturally penalizes such outcomes. We compare DiME to several baseline estimators of mutual information on a toy Gaussian dataset. We provide examples of use cases for DiME, such as latent factor disentanglement and a multiview representation learning problem where DiME is used to learn a shared representation among views with high mutual information.
翻译:摘要:我们提出一种与互信息具有类似性质的信息论量,该量无需对底层分布做出显式假设即可从数据中估计。该量基于近期提出的矩阵熵方法,通过利用归一化格拉姆矩阵的特征值,在再生核希尔伯特空间中计算非中心协方差算子的特征值估计。研究表明,矩阵熵的差异(DiME)非常适用于涉及最大化随机变量间互信息的问题。尽管此类任务的许多方法可能导致平凡解,但DiME天然能惩罚此类结果。我们在玩具高斯数据集上将DiME与几种互信息基线估计器进行了比较。本文还提供了DiME的应用案例,例如潜在因子解耦和多视角表征学习问题——在该问题中,DiME被用于学习具有高互信息的视角间的共享表征。