Statistical divergences quantify the difference between probability distributions finding multiple uses in machine-learning. However, a fundamental challenge is to estimate divergence from empirical samples since the underlying distributions of the data are usually unknown. In this work, we propose the representation Jensen-Shannon Divergence, a novel divergence based on covariance operators in reproducing kernel Hilbert spaces (RKHS). Our approach embeds the data distributions in an RKHS and exploits the spectrum of the covariance operators of the representations. We provide an estimator from empirical covariance matrices by explicitly mapping the data to an RKHS using Fourier features. This estimator is flexible, scalable, differentiable, and suitable for minibatch-based optimization problems. Additionally, we provide an estimator based on kernel matrices without having an explicit mapping to the RKHS. We show that this quantity is a lower bound on the Jensen-Shannon divergence, and we propose a variational approach to estimate it. We applied our divergence to two-sample testing outperforming related state-of-the-art techniques in several datasets. We used the representation Jensen-Shannon divergence as a cost function to train generative adversarial networks which intrinsically avoids mode collapse and encourages diversity.
翻译:统计散度量化了概率分布之间的差异,在机器学习中具有多种应用。然而,由于数据的真实分布通常未知,从经验样本中估计散度是一项根本性挑战。本文提出了表示詹森-香农散度(Representation Jensen-Shannon Divergence),一种基于再生核希尔伯特空间(RKHS)中协方差算子的新型散度。该方法将数据分布嵌入RKHS中,并利用表示协方差算子的谱特征。我们通过使用傅里叶特征将数据显式映射到RKHS,基于经验协方差矩阵提供了一种估计器。该估计器灵活、可扩展、可微分,适用于基于小批量的优化问题。此外,我们还提供了一种基于核矩阵的估计方法,无需对RKHS进行显式映射。我们证明了该量是詹森-香农散度的下界,并提出了一种变分估计方法。我们将该散度应用于双样本检验,在多个数据集上超越了相关最先进技术。我们使用表示詹森-香农散度作为生成对抗网络的代价函数进行训练,该方法内在避免了模式崩溃并促进了多样性。