This paper focuses on interpretable additive Gaussian process (GP) regression and its efficient implementation for large-scale data with a multi-dimensional grid structure, as commonly encountered in spatio-temporal analysis. A popular and scalable approach in the GP literature for this type of data exploits the Kronecker product structure in the covariance matrix. However, under the existing methodology, its use is limited to covariance functions with a separable product structure, which lacks flexibility in modelling and selecting interaction effects - an important component in many real-life problems. To address these issues, we propose a class of additive GP models constructed by hierarchical ANOVA kernels. Furthermore, we show that how the Kronecker method can be extended to the proposed class of models. Our approach allows for easy identification of interaction effects, straightforward interpretation of both main and interaction effects and efficient implementation for large-scale data. The proposed method is applied to analyse NO2 concentrations during the COVID-19 lockdown in London. Our scalable method enables analysis of hourly-recorded data collected from 59 different stations across the city, providing additional insights to findings from previous research using daily or weekly averaged data.
翻译:本文聚焦于可解释的加性高斯过程(GP)回归及其在大规模多维网格结构数据(常见于时空分析)中的高效实现。针对此类数据,GP文献中一种广泛采用的可扩展方法利用了协方差矩阵中的Kronecker乘积结构。然而,在现有方法框架下,该方法仅适用于具有可分离乘积结构的协方差函数,缺乏建模和选择交互效应的灵活性——而交互效应是许多实际问题中的重要组成部分。为解决这些问题,我们提出了一类基于层次化ANOVA核构建的加性GP模型。此外,我们展示了如何将Kronecker方法扩展到所提出的模型类别中。我们的方法能够轻松识别交互效应,直观解释主效应和交互效应,并对大规模数据实现高效计算。所提方法被应用于分析COVID-19封锁期间伦敦的NO₂浓度。我们的可扩展方法能够分析全城59个不同站点收集的小时级记录数据,从而为以往基于日或周平均数据的研究发现提供额外的洞察。