Gaussian processes (GPs) are a highly flexible, nonparametric statistical model that are commonly used to fit nonlinear relationships or account for correlation between observations. However, the computational load of fitting a Gaussian process is $\mathcal{O}(n^3)$ making them infeasible for use on large datasets. To make GPs more feasible for large datasets, this research focuses on the use of minibatching to estimate GP parameters. Specifically, we outline both approximate and exact minibatch Markov chain Monte Carlo algorithms that substantially reduce the computation of fitting a GP by only considering small subsets of the data at a time. We demonstrate and compare this methodology using various simulations and real datasets.
翻译:高斯过程(GPs)是一种高度灵活的非参数统计模型,常用于拟合非线性关系或解释观测值之间的相关性。然而,拟合高斯过程的计算负载为 $\mathcal{O}(n^3)$,这使得其在大数据集上难以应用。为了使高斯过程更适用于大数据集,本研究聚焦于使用小批量技术估计GP参数。具体而言,我们概述了近似和精确的小批量马尔可夫链蒙特卡洛算法,这些算法通过每次仅考虑数据的小子集,显著降低了拟合GP的计算量。我们通过多种模拟实验和真实数据集验证并比较了该方法。