The goal of most materials discovery is to discover materials that are superior to those currently known. Fundamentally, this is close to extrapolation, which is a weak point for most machine learning models that learn the probability distribution of data. Herein, we develop reinforcement learning-guided combinatorial chemistry, which is a rule-based molecular designer driven by trained policy for selecting subsequent molecular fragments to get a target molecule. Since our model has the potential to generate all possible molecular structures that can be obtained from combinations of molecular fragments, unknown molecules with superior properties can be discovered. We theoretically and empirically demonstrate that our model is more suitable for discovering better compounds than probability distribution-learning models. In an experiment aimed at discovering molecules that hit seven extreme target properties, our model discovered 1,315 of all target-hitting molecules and 7,629 of five target-hitting molecules out of 100,000 trials, whereas the probability distribution-learning models failed. Moreover, it has been confirmed that every molecule generated under the binding rules of molecular fragments is 100% chemically valid. To illustrate the performance in actual problems, we also demonstrate that our models work well on two practical applications: discovering protein docking molecules and HIV inhibitors.
翻译:大多数材料发现的目标是发现性能优于现有已知材料的材料。从根本上讲,这接近于外推,而外推是大多数学习数据概率分布的机器学习模型的弱点。为此,我们开发了强化学习引导的组合化学,这是一种基于规则、由训练策略驱动的分子设计器,用于选择后续的分子片段以得到目标分子。由于我们的模型有可能生成所有可通过分子片段组合获得的分子的结构,因此可以发现具有优异性能的未知分子。我们从理论和实验上证明,与概率分布学习模型相比,我们的模型更适合发现性能更优的化合物。在一项旨在发现满足七种极端目标性质的分子实验中,我们的模型在10万次试验中发现了1,315个满足全部目标性质的分子和7,629个满足其中五种目标性质的分子,而概率分布学习模型则未能成功。此外,已证实按照分子片段结合规则生成的每个分子在化学上100%有效。为展示在实际问题中的性能,我们还展示了模型在两个实际应用中的良好表现:发现蛋白质对接分子和HIV抑制剂。