A Gaussian process is proposed as a model for the posterior distribution of the local predictive ability of a model or expert, conditional on a vector of covariates, from historical predictions in the form of log predictive scores. Assuming Gaussian expert predictions and a Gaussian data generating process, a linear transformation of the predictive score follows a noncentral chi-squared distribution with one degree of freedom. Motivated by this we develop a noncentral chi-squared Gaussian process regression to flexibly model local predictive ability, with the posterior distribution of the latent GP function and kernel hyperparameters sampled by Hamiltonian Monte Carlo. We show that a cube-root transformation of the log scores is approximately Gaussian with homoscedastic variance, making it possible to estimate the model much faster by marginalizing the latent GP function analytically. A multi-output Gaussian process regression is also introduced to model the dependence in predictive ability between experts, both for inference and prediction purposes. Linear pools based on learned local predictive ability are applied to predict daily bike usage in Washington DC.
翻译:本文提出了一种高斯过程模型,用于在给定协变量向量的条件下,基于历史预测的对数预测分数来建模模型或专家的局部预测能力的后验分布。假设专家预测和数据生成过程均服从高斯分布,则预测分数的线性变换服从自由度为1的非中心卡方分布。受此启发,我们开发了一种非中心卡方高斯过程回归方法,以灵活建模局部预测能力,其中潜高斯过程函数的后验分布及核超参数通过哈密顿蒙特卡洛方法采样获得。我们证明了对数预测分数的立方根变换近似服从具有同方差的高斯分布,从而可以通过解析方式边缘化潜高斯过程函数来显著加快模型估计速度。本文还引入了多输出高斯过程回归来建模专家间预测能力的相关性,以同时服务于推断和预测目的。基于学习到的局部预测能力构建的线性池被应用于预测华盛顿特区每日自行车使用量。