Machine learning methods are increasingly used to build computationally inexpensive surrogates for complex physical models. The predictive capability of these surrogates suffers when data are noisy, sparse, or time-dependent. As we are interested in finding a surrogate that provides valid predictions of any potential future model evaluations, we introduce an online learning method empowered by optimizer-driven sampling. The method has two advantages over current approaches. First, it ensures that all turning points on the model response surface are included in the training data. Second, after any new model evaluations, surrogates are tested and "retrained" (updated) if the "score" drops below a validity threshold. Tests on benchmark functions reveal that optimizer-directed sampling generally outperforms traditional sampling methods in terms of accuracy around local extrema, even when the scoring metric favors overall accuracy. We apply our method to simulations of nuclear matter to demonstrate that highly accurate surrogates for the nuclear equation of state can be reliably auto-generated from expensive calculations using a few model evaluations.
翻译:机器学习方法日益广泛地用于构建复杂物理模型的计算廉价代理模型。当数据存在噪声、稀疏性或时间依赖性时,这些代理模型的预测能力会显著下降。针对需要为任何潜在的未来模型评估提供有效预测的代理模型问题,我们引入一种由优化器驱动采样的在线学习方法。该方法相较于现有方法具有两大优势:第一,确保模型响应曲面上的所有转折点均被纳入训练数据;第二,在新模型评估后,若代理模型的"评分"低于有效性阈值,则对其进行测试与"再训练"(更新)。对基准函数的测试表明,即使在评分指标偏重全局精度的条件下,优化器指导采样在局部极值区域精度方面普遍优于传统采样方法。我们将该方法应用于核物质仿真,论证了仅需少量模型评估即可通过昂贵计算可靠地自动生成核物质状态方程的高精度代理模型。