Optimization of chemical systems and processes have been enhanced and enabled by the guidance of algorithms and analytical approaches. While many methods will systematically investigate how underlying variables govern a given outcome, there is often a substantial number of experiments needed to accurately model these relations. As chemical systems increase in complexity, inexhaustive processes must propose experiments that efficiently optimize the underlying objective, while ideally avoiding convergence on unsatisfactory local minima. We have developed the Paddy software package around the Paddy Field Algorithm, a biologically inspired evolutionary optimization algorithm that propagates parameters without direct inference of the underlying objective function. Benchmarked against the Tree of Parzen Estimator, a Bayesian algorithm implemented in the Hyperopt software Library, Paddy displays efficient optimization with lower runtime, and avoidance of early convergence. Herein we report these findings for the cases of: global optimization of a two-dimensional bimodal distribution, interpolation of an irregular sinusoidal function, hyperparameter optimization of an artificial neural network tasked with classification of solvent for reaction components, and targeted molecule generation via optimization of input vectors for a decoder network. We anticipate that the facile nature of Paddy will serve to aid in automated experimentation, where minimization of investigative trials and or diversity of suitable solutions is of high priority.
翻译:摘要:化学系统与过程的优化已通过算法与分析方法的引导得到增强与实现。尽管许多方法会系统性地研究潜在变量如何主导特定结果,但通常仍需大量实验才能准确建模这些关系。随着化学系统复杂性的增加,非穷举式过程必须提出能高效优化潜在目标的实验,同时理想情况下避免收敛至不理想的局部极小值。我们基于"稻田算法"(Paddy Field Algorithm)开发了Paddy软件包——这是一种受生物启发的进化优化算法,无需直接推断潜在目标函数即可传播参数。与基于贝叶斯方法实现于Hyperopt软件库中的"帕曾树估计器"(Tree of Parzen Estimator)进行基准测试对比,Paddy展现了更低的运行时高效优化能力,并避免了过早收敛。本文报告了以下案例的研究发现:二维双峰分布的全局优化、不规则正弦函数的插值、用于反应组分溶剂分类的人工神经网络超参数优化,以及通过优化解码网络输入向量实现的目标分子生成。我们预计Paddy的易用特性将有助于自动化实验,其中探索性试验的最小化或合适解的多样性具有高度优先级。