Machine learning methods are increasingly used to build computationally inexpensive surrogates for complex physical models. The predictive capability of these surrogates suffers when data are noisy, sparse, or time-dependent. As we are interested in finding a surrogate that provides valid predictions of any potential future model evaluations, we introduce an online learning method empowered by optimizer-driven sampling. The method has two advantages over current approaches. First, it ensures that all turning points on the model response surface are included in the training data. Second, after any new model evaluations, surrogates are tested and "retrained" (updated) if the "score" drops below a validity threshold. Tests on benchmark functions reveal that optimizer-directed sampling generally outperforms traditional sampling methods in terms of accuracy around local extrema, even when the scoring metric favors overall accuracy. We apply our method to simulations of nuclear matter to demonstrate that highly accurate surrogates for the nuclear equation of state can be reliably auto-generated from expensive calculations using a few model evaluations.
翻译:机器学习方法日益广泛用于为复杂物理模型构建计算代价低廉的替代模型。当数据存在噪声、稀疏性或时间依赖性时,这些替代模型的预测能力会受到影响。由于我们旨在寻找一种能够对任何潜在未来模型评估提供有效预测的替代模型,本研究引入了一种基于优化器驱动采样的在线学习方法。该方法相较于现有方法具有两大优势:其一,确保模型响应曲面上的所有转折点均被纳入训练数据;其二,在完成任何新模型评估后,若替代模型的"评分"低于有效性阈值,则对其进行测试与"再训练"(更新)。基准函数测试表明,即使评分指标偏向整体精度,优化器导向采样在局部极值点附近的精度通常优于传统采样方法。我们将该方法应用于核物质模拟,验证了仅需少量模型评估即可从高计算代价的核物质状态方程计算中可靠地自动生成高精度替代模型。