Gaussian process regression is a frequently used statistical method for flexible yet fully probabilistic non-linear regression modeling. A common obstacle is its computational complexity which scales poorly with the number of observations. This is especially an issue when applying Gaussian process models to multiple functions simultaneously in various applications of functional data analysis. We consider a multi-level Gaussian process regression model where a common mean function and individual subject-specific deviations are modeled simultaneously as latent Gaussian processes. We derive exact analytic and computationally efficient expressions for the log-likelihood function and the posterior distributions in the case where the observations are sampled on either a completely or partially regular grid. This enables us to fit the model to large data sets that are currently computationally inaccessible using a standard implementation. We show through a simulation study that our analytic expressions are several orders of magnitude faster compared to a standard implementation, and we provide an implementation in the probabilistic programming language Stan.
翻译:高斯过程回归是一种常用的统计方法,用于实现灵活且完全概率化的非线性回归建模。其主要障碍在于计算复杂度随观测数量增加而急剧上升,这一特性在函数型数据分析的诸多应用中同时处理多个函数时尤为突出。本文提出一种多层次高斯过程回归模型,其中公共均值函数与个体特异性偏差被同时建模为潜高斯过程。针对观测数据在完全规则网格或部分规则网格上采样的情形,我们推导出对数似然函数与后验分布的精确解析表达式,这些表达式具有计算高效性。这使得该模型能够应用于当前标准实现方式无法处理的大规模数据集。通过模拟研究,我们证明所提解析表达式相比标准实现方式在计算速度上提升了数个数量级,并提供了基于概率编程语言Stan的实现方案。