This article focuses on drawing computationally-efficient predictive inference from Gaussian process (GP) regressions with a large number of features when the response is conditionally independent of the features given the projection to a noisy low dimensional manifold. Bayesian estimation of the regression relationship using Markov Chain Monte Carlo and subsequent predictive inference is computationally prohibitive and may lead to inferential inaccuracies since accurate variable selection is essentially impossible in such high-dimensional GP regressions. As an alternative, this article proposes a strategy to sketch the high-dimensional feature vector with a carefully constructed sketching matrix, before fitting a GP with the scalar outcome and the sketched feature vector to draw predictive inference. The analysis is performed in parallel with many different sketching matrices and smoothing parameters in different processors, and the predictive inferences are combined using Bayesian predictive stacking. Since posterior predictive distribution in each processor is analytically tractable, the algorithm allows bypassing the robustness issues due to convergence and mixing of MCMC chains, leading to fast implementation with very large number of features. Simulation studies show superior performance of the proposed approach with a wide variety of competitors. The approach outperforms competitors in drawing point prediction with predictive uncertainties of outdoor air pollution from satellite images.
翻译:本文重点研究在高维特征条件下,当响应变量在给定噪声低维流形投影后与特征条件独立时,如何从高斯过程(GP)回归中提取计算高效的预测推断。使用马尔可夫链蒙特卡罗方法进行回归关系的贝叶斯估计及后续预测推断在计算上不可行,且可能导致推断不准确,因为在此类高维GP回归中实现精确的变量选择基本不可能。作为替代方案,本文提出一种策略:首先通过精心构建的草图矩阵对高维特征向量进行草图化处理,随后拟合标量结果与草图化特征向量的高斯过程以进行预测推断。分析过程在不同处理器中并行执行,采用多种不同的草图矩阵和平滑参数,并通过贝叶斯预测堆叠方法整合预测推断结果。由于每个处理器的后验预测分布具有解析可处理性,该算法能够规避因MCMC链收敛与混合问题导致的稳健性缺陷,从而实现高维特征场景下的快速计算。仿真研究表明,所提方法在多种竞争方案中均表现出优越性能。在利用卫星图像预测室外空气污染及其不确定性方面,该方法在点预测性能上显著优于现有竞争方法。