Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality. The main idea is to design a classification problem for distinguishing training data from samples from an easy-to-sample noise distribution $q$, in a manner that avoids having to calculate a partition function. It is well-known that the choice of $q$ can severely impact the computational and statistical efficiency of NCE. In practice, a common choice for $q$ is a Gaussian which matches the mean and covariance of the data. In this paper, we show that such a choice can result in an exponentially bad (in the ambient dimension) conditioning of the Hessian of the loss, even for very simple data distributions. As a consequence, both the statistical and algorithmic complexity for such a choice of $q$ will be problematic in practice, suggesting that more complex noise distributions are essential to the success of NCE.
翻译:噪声对比估计(NCE)是一种流行的学习概率密度函数的方法,该函数可参数化至一个比例常数。其主要思想是通过设计一个分类问题来区分训练数据与来自易于采样的噪声分布$q$的样本,从而避免计算配分函数。众所周知,噪声分布$q$的选择会严重影响NCE的计算效率和统计效率。在实际应用中,通常选择与数据的均值和协方差相匹配的高斯分布作为$q$。本文表明,此类选择可能导致损失函数Hessian矩阵的条件数在环境维度上呈指数级恶化,即使对于非常简单的数据分布也是如此。因此,这种$q$的选取将在实践中导致统计与算法复杂度问题,这表明更复杂的噪声分布对于NCE的成功至关重要。