Gaussian process (GP) models that combine both categorical and continuous input variables have found use in analysis of longitudinal data and computer experiments. However, standard inference for these models has the typical cubic scaling, and common scalable approximation schemes for GPs cannot be applied since the covariance function is non-continuous. In this work, we derive a basis function approximation scheme for mixed-domain covariance functions, which scales linearly with respect to the number of observations and total number of basis functions. The proposed approach is naturally applicable to also Bayesian GP regression with discrete observation models. We demonstrate the scalability of the approach and compare model reduction techniques for additive GP models in a longitudinal data context. We confirm that we can approximate the exact GP model accurately in a fraction of the runtime compared to fitting the corresponding exact model. In addition, we demonstrate a scalable model reduction workflow for obtaining smaller and more interpretable models when dealing with a large number of candidate predictors.
翻译:结合分类变量与连续输入变量的高斯过程(GP)模型在纵向数据分析和计算机实验领域已有广泛应用。然而,此类模型的标准推断方法具有典型的三次计算复杂度,且由于协方差函数非连续,常见的高斯过程可扩展近似方案无法直接适用。本研究推导出一种适用于混合域协方差函数的基函数近似方案,其计算复杂度随观测数据量和基函数总数呈线性增长。所提方法天然适用于具有离散观测模型的贝叶斯高斯过程回归。我们在纵向数据背景下验证了该方法的可扩展性,并比较了加性高斯过程模型的降维技术。实验证实,与拟合相应的精确模型相比,我们能够以远少于其运行时间的计算代价,精确逼近原始高斯过程模型。此外,我们展示了一种可扩展的模型降维流程,可在处理大量候选预测变量时获得更精简且更具可解释性的模型。