Gaussian process regression is widely used because of its ability to provide well-calibrated uncertainty estimates and handle small or sparse datasets. However, it struggles with high-dimensional data. One possible way to scale this technique to higher dimensions is to leverage the implicit low-dimensional manifold upon which the data actually lies, as postulated by the manifold hypothesis. Prior work ordinarily requires the manifold structure to be explicitly provided though, i.e. given by a mesh or be known to be one of the well-known manifolds like the sphere. In contrast, in this paper we propose a Gaussian process regression technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way. For the resulting model, we discuss its convergence to the Mat\'ern Gaussian process on the assumed manifold. Our technique scales up to hundreds of thousands of data points, and may improve the predictive performance and calibration of the standard Gaussian process regression in high-dimensional settings.
翻译:高斯过程回归因其能够提供良好校准的不确定性估计并处理小样本或稀疏数据集而得到广泛应用。然而,该方法在处理高维数据时存在困难。将这种技术扩展至更高维度的一种可行方法是利用流形假说所假定的数据实际所在的隐式低维流形。先前的研究通常要求显式提供流形结构,例如通过网格给出,或已知其属于球面等经典流形。与此不同,本文提出一种能够以完全可微的方式直接从数据(包括标记与未标记数据)中推断隐式结构的高斯过程回归技术。针对所得模型,我们讨论了其在假定的流形上收敛至Matérn高斯过程的条件。该技术可扩展至数十万数据点,并能在高维场景下提升标准高斯过程回归的预测性能与校准能力。