We propose a new density estimation algorithm. Given $n$ i.i.d. samples from a distribution belonging to a class of densities on $\mathbb{R}^d$, our estimator outputs any density in the class whose ''perceptron discrepancy'' with the empirical distribution is at most $O(\sqrt{d/n})$. The perceptron discrepancy between two distributions is defined as the largest difference in mass that they place on any halfspace of $\mathbb{R}^d$. It is shown that this estimator achieves expected total variation distance to the truth that is almost minimax optimal over the class of densities with bounded Sobolev norm and Gaussian mixtures. This suggests that regularity of the prior distribution could be an explanation for the efficiency of the ubiquitous step in machine learning that replaces optimization over large function spaces with simpler parametric classes (e.g. in the discriminators of GANs). We generalize the above to show that replacing the ''perceptron discrepancy'' with the generalized energy distance of Sz\'ekeley-Rizzo further improves total variation loss. The generalized energy distance between empirical distributions is easily computable and differentiable, thus making it especially useful for fitting generative models. To the best of our knowledge, it is the first example of a distance with such properties for which there are minimax statistical guarantees.
翻译:我们提出了一种新的密度估计算法。给定来自 $\mathbb{R}^d$ 上某密度类中分布的 $n$ 个独立同分布样本,我们的估计器输出该类中任意一个密度,其与经验分布之间的“感知器差异”至多为 $O(\sqrt{d/n})$。两个分布之间的感知器差异定义为它们赋予 $\mathbb{R}^d$ 任意半空间的最大质量差。研究表明,该估计器对真实分布的总变差距离期望在具有有界索伯列夫范数及高斯混合的密度类上几乎达到极小极大最优。这表明先验分布的正则性可能是机器学习中常见步骤(例如生成对抗网络中的判别器)效率的一种解释,该步骤将大函数空间上的优化简化为更简单的参数化类别。我们将上述方法推广,证明用Székely-Rizzo广义能量距离替代“感知器差异”可进一步改进总变差损失。经验分布之间的广义能量距离易于计算且可微,因而尤其适用于拟合生成模型。据我们所知,这是首个兼具此类性质且具备极小极大统计保证的距离实例。