Supervised machine learning describes the practice of fitting a parameterized model to labeled input-output data. Supervised machine learning methods have demonstrated promise in learning efficient surrogate models that can (partially) replace expensive high-fidelity models, making many-query analyses, such as optimization, uncertainty quantification, and inference, tractable. However, when training data must be obtained through the evaluation of an expensive model or experiment, the amount of training data that can be obtained is often limited, which can make learned surrogate models unreliable. However, in many engineering and scientific settings, cheaper \emph{low-fidelity} models may be available, for example arising from simplified physics modeling or coarse grids. These models may be used to generate additional low-fidelity training data. The goal of \emph{multifidelity} machine learning is to use both high- and low-fidelity training data to learn a surrogate model which is cheaper to evaluate than the high-fidelity model, but more accurate than any available low-fidelity model. This work proposes a new multifidelity training approach for Gaussian process regression which uses low-fidelity data to define additional features that augment the input space of the learned model. The approach unites desirable properties from two separate classes of existing multifidelity GPR approaches, cokriging and autoregressive estimators. Numerical experiments on several test problems demonstrate both increased predictive accuracy and reduced computational cost relative to the state of the art.
翻译:摘要:监督式机器学习描述了将参数化模型拟合至标注输入-输出数据的实践方法。此类方法在高效代理模型学习方面展现出潜力,可(部分)替代昂贵的高保真模型,从而使得优化、不确定性量化及推断等多查询分析变得可行。然而,当训练数据必须通过高成本模型评估或实验获取时,可获得的训练数据量往往受限,这可能导致所学习的代理模型可靠性不足。但在许多工程与科学场景中,可能存在更廉价的低保真度模型(例如源于简化物理建模或粗糙网格的模型)。此类模型可用于生成额外的低保真度训练数据。多保真度机器学习的目标是利用高保真与低保真训练数据学习代理模型,使其评估成本低于高保真模型,同时精度优于任何可用的低保真模型。本文提出一种新型多保真度高斯过程回归训练方法,通过利用低保真数据定义额外特征来增强学习模型的输入空间。该方法融合了现有多保真度GPR方法中协同克里金法与自回归估计量两类方法的有益特性。多组数值实验表明,与现有技术相比,本方法在提升预测精度的同时显著降低了计算成本。