A statistical emulator can be used as a surrogate of complex physics-based calculations to drastically reduce the computational cost. Its successful implementation hinges on an accurate representation of the nonlinear response surface with a high-dimensional input space. Conventional "space-filling" designs, including random sampling and Latin hypercube sampling, become inefficient as the dimensionality of the input variables increases, and the predictive accuracy of the emulator can degrade substantially for a test input distant from the training input set. To address this fundamental challenge, we develop a reliable emulator for predicting complex functionals by active learning with error control (ALEC). The algorithm is applicable to infinite-dimensional mapping with high-fidelity predictions and a controlled predictive error. The computational efficiency has been demonstrated by emulating the classical density functional theory (cDFT) calculations, a statistical-mechanical method widely used in modeling the equilibrium properties of complex molecular systems. We show that ALEC is much more accurate than conventional emulators based on the Gaussian processes with "space-filling" designs and alternative active learning methods. Besides, it is computationally more efficient than direct cDFT calculations. ALEC can be a reliable building block for emulating expensive functionals owing to its minimal computational cost, controllable predictive error, and fully automatic features.
翻译:统计仿真器可作为基于复杂物理计算的高效替代模型,大幅降低计算成本。其成功实施依赖于对具有高维输入空间的非线性响应曲面的精确表征。当输入变量维度增加时,传统的"空间填充"设计(包括随机采样和拉丁超立方采样)效率会显著下降,且对于远离训练输入集的测试输入,仿真器的预测精度可能大幅退化。为应对这一根本性挑战,我们开发了一种通过主动学习与误差控制(ALEC)实现复杂泛函预测的可靠仿真器。该算法适用于无限维映射,具备高保真预测能力与可控预测误差。通过模拟经典密度泛函理论(cDFT)计算——一种广泛应用于复杂分子系统平衡性质建模的统计力学方法——验证了其计算效率。研究表明,ALEC的精度远高于基于高斯过程与"空间填充"设计的传统仿真器及替代主动学习方法。同时,其计算效率优于直接cDFT计算。凭借最小化计算成本、可控预测误差与全自动特性,ALEC可作为仿真昂贵泛函的可靠基础模块。