Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression model are insensitive to small perturbations of the data. This article identifies scenarios where the maximum likelihood estimator fails to be well-posed, in that the predictive distributions are not Lipschitz in the data with respect to the Hellinger distance. These failure cases occur in the noiseless data setting, for any Gaussian process with a stationary covariance function whose lengthscale parameter is estimated using maximum likelihood. Although the failure of maximum likelihood estimation is part of Gaussian process folklore, these rigorous theoretical results appear to be the first of their kind. The implication of these negative results is that well-posedness may need to be assessed post-hoc, on a case-by-case basis, when maximum likelihood estimation is used to train a Gaussian process model.
翻译:高斯过程回归支撑着机器学习和统计学中无数的学术与工业应用,其中常采用最大似然估计为协方差核选取合适参数。然而,最大似然估计在何种条件下具有适定性(即回归模型的预测结果对数据的微小扰动不敏感)仍是一个待解决的问题。本文揭示了最大似然估计器丧失适定性的场景——其预测分布在Hellinger距离下非Lipschitz连续于数据。这些失效情形出现在无噪声数据设定中,针对任何使用最大似然估计估计长度尺度参数的平稳协方差函数高斯过程。尽管最大似然估计的失效在高斯过程领域已成经验共识,但本文提出的严谨理论结果尚属首次。这些负面结论的启示在于:当采用最大似然估计训练高斯过程模型时,需在个案基础上事后评估其适定性。