Supervised Learning of Functional Outcomes with Predictors at Different Scales: A Functional Gaussian Process Approach

The analysis of complex computer simulations, often involving functional data, presents unique statistical challenges. Conventional regression methods, such as function-on-function regression, typically associate functional outcomes with both scalar and functional predictors on a per-realization basis. However, simulation studies often demand a more nuanced approach to disentangle nonlinear relationships of functional outcome with predictors observed at multiple scales: domain-specific functional predictors that are fixed across simulation runs, and realization-specific global predictors that vary between runs. In this article, we develop a novel supervised learning framework tailored to this setting. We propose an additive nonlinear regression model that flexibly captures the influence of both predictor types. The effects of functional predictors are modeled through spatially-varying coefficients governed by a Gaussian process prior. Crucially, to capture the impact of global predictors on the functional outcome, we introduce a functional Gaussian process (fGP) prior. This new prior jointly models the entire collection of unknown, spatially-indexed nonlinear functions that encode the effects of the global predictors over the entire domain, explicitly accounting for their spatial dependence. This integrated architecture enables simultaneous learning from both predictor types, provides a principled strategies to quantify their respective contributions in predicting the functional outcome, and delivers rigorous uncertainty estimates for both model parameters and predictions. The utility and robustness of our approach are demonstrated through multiple synthetic datasets and a real-world application involving outputs from the Sea, Lake, and Overland Surges from Hurricanes (SLOSH) model.

翻译：复杂计算机模拟（通常涉及功能数据）的分析提出了独特的统计挑战。传统的回归方法（如函数对函数回归）通常将功能结果与每次实现中的标量和功能预测变量相关联。然而，模拟研究往往需要一种更细致的方法来解构功能结果与在多个尺度上观测到的预测变量之间的非线性关系：这些尺度包括在模拟运行中固定的、特定于领域的功能预测变量，以及在运行之间变化的、特定于实现的全局预测变量。本文中，我们针对这一场景开发了一种新颖的有监督学习框架。我们提出了一种加性非线性回归模型，能够灵活捕捉两类预测变量的影响。功能预测变量的效应通过由高斯过程先验控制的空间变化系数进行建模。至关重要的是，为了捕捉全局预测变量对功能结果的影响，我们引入了一种功能高斯过程（fGP）先验。这种新的先验联合建模了整个未知的、空间索引的非线性函数集合，这些函数在整个域上编码了全局预测变量的效应，并明确考虑了它们的空间依赖性。这种集成架构能够同时从两类预测变量中学习，提供了量化它们在预测功能结果中各自贡献的原则性策略，并为模型参数和预测提供了严格的不确定性估计。我们通过多个合成数据集以及一个涉及飓风引发的海洋、湖泊和陆上风暴潮（SLOSH）模型输出的实际应用，证明了该方法的实用性和鲁棒性。

相关内容

高斯过程

关注 6

高斯过程（Gaussian Process, GP）是概率论和数理统计中随机过程（stochastic process）的一种，是一系列服从正态分布的随机变量（random variable）在一指数集（index set）内的组合。高斯过程中任意随机变量的线性组合都服从正态分布，每个有限维分布都是联合正态分布，且其本身在连续指数集上的概率密度函数即是所有随机变量的高斯测度，因此被视为联合正态分布的无限维广义延伸。高斯过程由其数学期望和协方差函数完全决定，并继承了正态分布的诸多性质

【普林斯顿博士论文】监督学习与强化学习中的元学习分析

专知会员服务

24+阅读 · 2025年7月1日

【剑桥大学博士论文】可识别的因果表示学习：无监督、多视图和多环境

专知会员服务

34+阅读 · 2024年6月25日

【剑桥大学博士论文】使用机器学习的因果推断中的两个问题的半参数方法

专知会员服务

26+阅读 · 2024年5月25日

【剑桥大学博士论文】可识别的因果表示学习：无监督、多视角、多环境，192页pdf

专知会员服务

42+阅读 · 2024年3月24日