Bayesian Optimization (BO) machine learning method is increasingly used to guide experimental optimization tasks in materials science. To emulate the large number of input variables and noise-containing results in experimental materials research, we perform batch BO simulation of six design variables with a range of noise levels. Two test cases relevant for materials science problems are examined: a needle-in-a-haystack case (Ackley function) that may be encountered in, e.g., molecule optimizations, and a smooth landscape with a local optimum in addition to the global optimum (Hartmann function) that may be encountered in, e.g., material composition optimization. We show learning curves, performance metrics, and visualization to effectively track the optimization progression and evaluate how the optimization outcomes are affected by noise, batch-picking method, choice of acquisition function, and exploration hyperparameter values. We find that the effects of noise depend on the problem landscape: noise degrades the optimization results of a needle-in-a-haystack search (Ackley) dramatically more. However, with increasing noise, we observe an increasing probability of landing on the local optimum in Hartmann. Therefore, prior knowledge of the problem domain structure and noise level is essential when designing BO for materials research experiments. Synthetic data studies -- with known ground truth and controlled noise levels -- enable us to isolate and evaluate the impact of different batch BO components, {\it e.g.}, acquisition policy, objective metrics, and hyperparameter values, before transitioning to the inherent uncertainties of real experimental systems. The results and methodology of this study will facilitate a greater utilization of BO in guiding experimental materials research, specifically in settings with a large number of design variables to optimize.
翻译:贝叶斯优化(BO)机器学习方法在材料科学实验优化任务中的应用日益广泛。为模拟实验材料研究中常见的多输入变量及含噪声结果,我们针对六个设计变量在不同噪声水平下进行了批量BO仿真。研究考察了与材料科学问题相关的两个测试案例:一是可能出现在分子优化等场景中的"大海捞针"型案例(Ackley函数),二是可能出现在材料成分优化等场景中除全局最优外还存在局部最优的平滑景观案例(Hartmann函数)。通过展示学习曲线、性能指标及可视化分析,我们有效追踪了优化进程,并评估了噪声、批量选择方法、采集函数选择以及探索超参数值对优化结果的影响。研究发现噪声效应取决于问题景观:在"大海捞针"型搜索(Ackley)中,噪声对优化结果的破坏性影响显著更大。然而随着噪声增强,我们观察到Hartmann函数中落入局部最优的概率逐渐增加。因此,在设计材料研究实验的BO方案时,必须预先掌握问题域结构和噪声水平信息。通过已知真实值和可控噪声水平的合成数据研究,我们能够在转向真实实验系统固有不确定性之前,分离并评估不同批量BO组件(如采集策略、目标指标和超参数值)的影响。本研究的结果与方法将推动BO在指导实验材料研究中的更广泛应用,特别是在需要优化大量设计变量的场景中。