This article focuses on drawing computationally-efficient predictive inference from Gaussian process (GP) regressions with a large number of features when the response is conditionally independent of the features given the projection to a noisy low dimensional manifold. Bayesian estimation of the regression relationship using Markov Chain Monte Carlo and subsequent predictive inference is computationally prohibitive and may lead to inferential inaccuracies since accurate variable selection is essentially impossible in such high-dimensional GP regressions. As an alternative, this article proposes a strategy to sketch the high-dimensional feature vector with a carefully constructed sketching matrix, before fitting a GP with the scalar outcome and the sketched feature vector to draw predictive inference. The analysis is performed in parallel with many different sketching matrices and smoothing parameters in different processors, and the predictive inferences are combined using Bayesian predictive stacking. Since posterior predictive distribution in each processor is analytically tractable, the algorithm allows bypassing the robustness issues due to convergence and mixing of MCMC chains, leading to fast implementation with very large number of features. Simulation studies show superior performance of the proposed approach with a wide variety of competitors. The approach outperforms competitors in drawing point prediction with predictive uncertainties of outdoor air pollution from satellite images.
翻译:本文重点研究在响应变量给定噪声低维流形投影条件下与特征条件独立时,从具有大量特征的高斯过程回归中获取计算高效的预测推断。使用马尔可夫链蒙特卡洛方法进行回归关系的贝叶斯估计及后续预测推断在计算上不可行,且由于在此类高维高斯过程回归中精确变量选择基本无法实现,可能导致推断误差。作为替代方案,本文提出一种策略:在拟合标量结果与草图特征向量的高斯过程以获取预测推断之前,通过精心构建的草图矩阵对高维特征向量进行草图化处理。分析过程在不同处理器中并行使用多种不同的草图矩阵和平滑参数执行,并通过贝叶斯预测堆叠方法整合预测推断。由于每个处理器的后验预测分布具有解析可处理性,该算法能够规避因MCMC链收敛与混合问题导致的稳健性难题,实现对海量特征的快速计算。仿真研究表明,所提方法在多种竞争方案中均表现出优越性能。在利用卫星图像预测室外空气污染的不确定性方面,该方法在点预测方面优于现有竞争方法。