Gaussian processes are used in many machine learning applications that rely on uncertainty quantification. Recently, computational tools for working with these models in geometric settings, such as when inputs lie on a Riemannian manifold, have been developed. This raises the question: can these intrinsic models be shown theoretically to lead to better performance, compared to simply embedding all relevant quantities into $\mathbb{R}^d$ and using the restriction of an ordinary Euclidean Gaussian process? To study this, we prove optimal contraction rates for intrinsic Mat\'ern Gaussian processes defined on compact Riemannian manifolds. We also prove analogous rates for extrinsic processes using trace and extension theorems between manifold and ambient Sobolev spaces: somewhat surprisingly, the rates obtained turn out to coincide with those of the intrinsic processes, provided that their smoothness parameters are matched appropriately. We illustrate these rates empirically on a number of examples, which, mirroring prior work, show that intrinsic processes can achieve better performance in practice. Therefore, our work shows that finer-grained analyses are needed to distinguish between different levels of data-efficiency of geometric Gaussian processes, particularly in settings which involve small data set sizes and non-asymptotic behavior.
翻译:高斯过程被广泛应用于依赖不确定性量化的机器学习任务中。近年来,已开发出在几何环境(如输入位于黎曼流形上)下处理此类模型的计算工具。这引发了一个问题:与将所有相关量嵌入$\mathbb{R}^d$并直接使用普通欧几里得高斯过程的限制相比,能否从理论上证明这些内蕴模型能带来更优性能?为探究此问题,我们证明了定义在紧致黎曼流形上的内蕴Matérn高斯过程的最优收缩速率。同时,利用流形与环境Sobolev空间之间的迹定理和延拓定理,我们还证明了外蕴过程的类似速率:令人惊讶的是,当适当匹配其光滑度参数时,所得速率与内蕴过程的速率恰好一致。我们通过多个实例对这些速率进行经验验证,结果与先前研究一致,表明内蕴过程在实践中能实现更优性能。因此,本文工作表明,需要更精细的分析来区分几何高斯过程不同层级的数据效率,特别是在小数据集规模和非渐近行为场景中。