In the sciences, regression tasks often require predicting high-dimensional outputs from few training examples. Multi-output Gaussian processes excel in low-data regimes but typically struggle with high-dimensional outputs. Compress-then-predict pipelines such as PCA-GP (principal component analysis plus Gaussian process regression) handle high dimensionality, but rely on bases optimized for reconstruction rather than prediction. To address this gap, we propose a model that represents each output as a linear-Gaussian decoding of a low-dimensional latent state drawn from a Gaussian process prior. By analytically marginalizing the decoder weights, we couple compression and prediction in a single objective that scales to high-dimensional outputs. We refer to this model as Gaussian process latent factor regression (GPLFR). We demonstrate GPLFR by building the first spatially resolved emulator of global climate models for rocky exoplanets.
翻译:在科学领域中,回归任务通常需要从少量训练样本中预测高维输出。多输出高斯过程在低数据规模下表现优异,但处理高维输出时往往存在困难。压缩-预测流水线(如PCA-GP,即主成分分析结合高斯过程回归)虽能处理高维问题,但其基函数优化目标侧重于重构而非预测。为解决这一局限,我们提出一种模型:将每个输出表示为从高斯过程先验中提取的低维潜在状态的线性-高斯解码。通过对解码器权重进行解析边际化处理,我们将压缩与预测耦合于单一目标函数中,该函数可扩展至高维输出场景。我们将此模型称为高斯过程潜在因子回归(GPLFR)。为验证GPLFR的有效性,我们构建了首个针对岩石系外行星全球气候模型的空间分辨仿真器。