The Representation Jensen-Shannon Divergence

Statistical divergences quantify the difference between probability distributions, thereby allowing for multiple uses in machine-learning. However, a fundamental challenge of these quantities is their estimation from empirical samples since the underlying distributions of the data are usually unknown. In this work, we propose a divergence inspired by the Jensen-Shannon divergence which avoids the estimation of the probability density functions. Our approach embeds the data in an reproducing kernel Hilbert space (RKHS) where we associate data distributions with uncentered covariance operators in this representation space. Therefore, we name this measure the representation Jensen-Shannon divergence (RJSD). We provide an estimator from empirical covariance matrices by explicitly mapping the data to an RKHS using Fourier features. This estimator is flexible, scalable, differentiable, and suitable for minibatch-based optimization problems. Additionally, we provide an estimator based on kernel matrices without an explicit mapping to the RKHS. We provide consistency convergence results for the proposed estimator. Moreover, we demonstrate that this quantity is a lower bound on the Jensen-Shannon divergence, leading to a variational approach to estimate it with theoretical guarantees. We leverage the proposed divergence to train generative networks, where our method mitigates mode collapse and encourages samples diversity. Additionally, RJSD surpasses other state-of-the-art techniques in multiple two-sample testing problems, demonstrating superior performance and reliability in discriminating between distributions.

翻译：统计散度量化了概率分布之间的差异，从而在机器学习中具有多种用途。然而，这些量度的基本挑战在于如何从经验样本中估计它们，因为数据的潜在分布通常是未知的。在这项工作中，我们提出了一种受詹森-香农散度启发的散度，该散度避免了概率密度函数的估计。我们的方法将数据嵌入一个再生核希尔伯特空间（RKHS），并在该表示空间中将数据分布与无中心协方差算子相关联。因此，我们将这种度量称为表示詹森-香农散度（RJSD）。我们通过使用傅里叶特征将数据显式映射到RKHS，提供了基于经验协方差矩阵的估计器。该估计器灵活、可扩展、可微分，适用于基于小批量的优化问题。此外，我们还提供了基于核矩阵的估计器，无需显式映射到RKHS。我们为所提出的估计器提供了一致性收敛结果。此外，我们证明了这个量是詹森-香农散度的下界，从而导致了具有理论保证的变分估计方法。我们利用所提出的散度来训练生成网络，其中我们的方法减轻了模式崩溃并鼓励样本多样性。此外，RJSD在多个双样本检验问题中超越了其他最先进的技术，展示了在区分分布方面的卓越性能和可靠性。