Statistical divergences quantify the difference between probability distributions finding multiple uses in machine-learning. However, a fundamental challenge is to estimate divergence from empirical samples since the underlying distributions of the data are usually unknown. In this work, we propose the representation Jensen-Shannon Divergence, a novel divergence based on covariance operators in reproducing kernel Hilbert spaces (RKHS). Our approach embeds the data distributions in an RKHS and exploits the spectrum of the covariance operators of the representations. We provide an estimator from empirical covariance matrices by explicitly mapping the data to an RKHS using Fourier features. This estimator is flexible, scalable, differentiable, and suitable for minibatch-based optimization problems. Additionally, we provide an estimator based on kernel matrices without having an explicit mapping to the RKHS. We show that this quantity is a lower bound on the Jensen-Shannon divergence, and we propose a variational approach to estimate it. We applied our divergence to two-sample testing outperforming related state-of-the-art techniques in several datasets. We used the representation Jensen-Shannon divergence as a cost function to train generative adversarial networks which intrinsically avoids mode collapse and encourages diversity.
翻译:统计散度量化了概率分布之间的差异,在机器学习中有多种应用。然而,一个基本挑战是从经验样本中估计散度,因为数据的潜在分布通常是未知的。在这项工作中,我们提出了表示 Jensen-Shannon 散度,这是一种基于再生核希尔伯特空间中协方差算子的新型散度。我们的方法将数据分布嵌入到再生核希尔伯特空间中,并利用表示协方差算子的谱。我们通过使用傅里叶特征将数据显式映射到再生核希尔伯特空间,从经验协方差矩阵中提供了一个估计器。该估计器灵活、可扩展、可微,并且适用于基于小批量的优化问题。此外,我们提供了基于核矩阵的估计器,不需要显式映射到再生核希尔伯特空间。我们证明了该量是 Jensen-Shannon 散度的下界,并提出了一个变分方法来估计它。我们将我们的散度应用于双样本检验,在多个数据集上优于相关的最新技术。我们使用表示 Jensen-Shannon 散度作为代价函数来训练生成对抗网络,这从本质上避免了模式崩溃并鼓励多样性。