Gaussian processes are used in many machine learning applications that rely on uncertainty quantification. Recently, computational tools for working with these models in geometric settings, such as when inputs lie on a Riemannian manifold, have been developed. This raises the question: can these intrinsic models be shown theoretically to lead to better performance, compared to simply embedding all relevant quantities into $\mathbb{R}^d$ and using the restriction of an ordinary Euclidean Gaussian process? To study this, we prove optimal contraction rates for intrinsic Mat\'ern Gaussian processes defined on compact Riemannian manifolds. We also prove analogous rates for extrinsic processes using trace and extension theorems between manifold and ambient Sobolev spaces: somewhat surprisingly, the rates obtained turn out to coincide with those of the intrinsic processes, provided that their smoothness parameters are matched appropriately. We illustrate these rates empirically on a number of examples, which, mirroring prior work, show that intrinsic processes can achieve better performance in practice. Therefore, our work shows that finer-grained analyses are needed to distinguish between different levels of data-efficiency of geometric Gaussian processes, particularly in settings which involve small data set sizes and non-asymptotic behavior.
翻译:高斯过程在依赖不确定性量化的许多机器学习应用中得到广泛使用。近年来,针对几何设定(如输入位于黎曼流形上)中这些模型的计算工具已被开发出来。这引发了一个问题:与简单地将所有相关量嵌入$\mathbb{R}^d$并使用普通欧几里得高斯过程的限制相比,这些内蕴模型能否从理论上被证明能带来更好的性能?为研究这一问题,我们证明了定义在紧致黎曼流形上的内蕴Matérn高斯过程的最优收缩速率。我们还利用流形与外部Sobolev空间之间的迹定理和延拓定理,证明了外蕴过程的类似速率:有点令人惊讶的是,当匹配适当的平滑参数时,所得速率恰好与内蕴过程的速率一致。我们通过多个实例实证验证了这些速率,这些实例与先前工作一致,表明内蕴过程在实践中能实现更好的性能。因此,我们的工作表明,需要更精细的分析来区分几何高斯过程在不同数据效率水平上的表现,尤其是在数据集规模较小且涉及非渐近行为的设定中。