Gaussian processes (GPs) are sophisticated distributions to model functional data. Whilst theoretically appealing, they are computationally cumbersome except for small datasets. We implement two methods for scaling GP inference in Stan: First, a general sparse approximation using a directed acyclic dependency graph; second, a fast, exact method for regularly spaced data modeled by GPs with stationary kernels using the fast Fourier transform. Based on benchmark experiments, we offer guidance for practitioners to decide between different methods and parameterizations. We consider two real-world examples to illustrate the package. The implementation follows Stan's design and exposes performant inference through a familiar interface. Full posterior inference for ten thousand data points is feasible on a laptop in less than 20 seconds. Details on how to get started using the popular interfaces cmdstanpy for Python and cmdstanr for R are provided.
翻译:高斯过程(GP)是用于建模函数数据的复杂分布。尽管在理论上具有吸引力,但除了小规模数据集外,其计算成本高昂。我们在Stan中实现了两种扩展GP推断的方法:第一种是使用有向无环依赖图的通用稀疏近似方法;第二种是针对平稳核函数GP建模的规则间隔数据的快速精确方法,该方法利用快速傅里叶变换实现。基于基准实验,我们为实践者提供了在不同方法与参数化之间做出决策的指南。通过两个真实世界案例说明本软件包的使用。该实现遵循Stan的设计理念,通过熟悉的接口实现高性能推断。在笔记本电脑上,对一万个数据点的完整后验推断可在20秒内完成。文中还提供了如何通过流行接口(Python的cmdstanpy和R的cmdstanr)开始使用的详细信息。