Fully Bayesian inference for latent variable Gaussian process models

Real engineering and scientific applications often involve one or more qualitative inputs. Standard Gaussian processes (GPs), however, cannot directly accommodate qualitative inputs. The recently introduced latent variable Gaussian process (LVGP) overcomes this issue by first mapping each qualitative factor to underlying latent variables (LVs), and then uses any standard GP covariance function over these LVs. The LVs are estimated similarly to the other GP hyperparameters through maximum likelihood estimation, and then plugged into the prediction expressions. However, this plug-in approach will not account for uncertainty in estimation of the LVs, which can be significant especially with limited training data. In this work, we develop a fully Bayesian approach for the LVGP model and for visualizing the effects of the qualitative inputs via their LVs. We also develop approximations for scaling up LVGPs and fully Bayesian inference for the LVGP hyperparameters. We conduct numerical studies comparing plug-in inference against fully Bayesian inference over a few engineering models and material design applications. In contrast to previous studies on standard GP modeling that have largely concluded that a fully Bayesian treatment offers limited improvements, our results show that for LVGP modeling it offers significant improvements in prediction accuracy and uncertainty quantification over the plug-in approach.

翻译：实际工程和科学应用常涉及一个或多个定性输入变量。然而，标准高斯过程（Gaussian Process, GP）无法直接处理定性输入。近期提出的隐变量高斯过程（Latent Variable Gaussian Process, LVGP）通过将每个定性因子映射至底层隐变量（Latent Variables, LVs），再对这些隐变量采用标准GP协方差函数，从而解决了该问题。隐变量的估计与其他GP超参数类似，通过最大似然估计获得后直接代入预测表达式。但此类嵌入方法无法考虑隐变量估计中的不确定性——在训练数据有限时，该不确定性尤为显著。本研究针对LVGP模型开发了全贝叶斯方法，并利用隐变量可视化定性输入的影响。我们还提出了用于扩展LVGP规模的近似方法，以及针对LVGP超参数的全贝叶斯推断近似方案。通过多项工程模型与材料设计应用的数值研究，我们对比了嵌入推断与全贝叶斯推断的性能。与既有标准GP建模研究中“全贝叶斯处理带来的改进有限”的结论不同，本研究发现：对于LVGP建模，全贝叶斯方法在预测精度与不确定性量化方面显著优于嵌入方法。

相关内容

LVS

关注 0

LVS （Linux虚拟服务器） LVS集群采用IP负载均衡技术和基于内容请求分发技术。调度器具有很好的吞吐率，将请求均衡地转移到不同的服务器上执行，且调度器自动屏蔽掉服务器的故障，从而将一组服务器构成一个高性能的、高可用的虚拟服务器。整个服务器集群的结构对客户是透明的，而且无需修改客户端和服务器端的程序。为此，在设计时需要考虑系统的透明性、可伸缩性、高可用性和易管理性。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日