Materials Discovery with Extreme Properties via Reinforcement Learning-Guided Combinatorial Chemistry

The goal of most materials discovery is to discover materials that are superior to those currently known. Fundamentally, this is close to extrapolation, which is a weak point for most machine learning models that learn the probability distribution of data. Herein, we develop reinforcement learning-guided combinatorial chemistry, which is a rule-based molecular designer driven by trained policy for selecting subsequent molecular fragments to get a target molecule. Since our model has the potential to generate all possible molecular structures that can be obtained from combinations of molecular fragments, unknown molecules with superior properties can be discovered. We theoretically and empirically demonstrate that our model is more suitable for discovering better compounds than probability distribution-learning models. In an experiment aimed at discovering molecules that hit seven extreme target properties, our model discovered 1,315 of all target-hitting molecules and 7,629 of five target-hitting molecules out of 100,000 trials, whereas the probability distribution-learning models failed. Moreover, it has been confirmed that every molecule generated under the binding rules of molecular fragments is 100% chemically valid. To illustrate the performance in actual problems, we also demonstrate that our models work well on two practical applications: discovering protein docking molecules and HIV inhibitors.

翻译：大多数材料发现的目标是发现性能优于现有已知材料的材料。从根本上讲，这接近于外推，而外推是大多数学习数据概率分布的机器学习模型的弱点。为此，我们开发了强化学习引导的组合化学，这是一种基于规则、由训练策略驱动的分子设计器，用于选择后续的分子片段以得到目标分子。由于我们的模型有可能生成所有可通过分子片段组合获得的分子的结构，因此可以发现具有优异性能的未知分子。我们从理论和实验上证明，与概率分布学习模型相比，我们的模型更适合发现性能更优的化合物。在一项旨在发现满足七种极端目标性质的分子实验中，我们的模型在10万次试验中发现了1,315个满足全部目标性质的分子和7,629个满足其中五种目标性质的分子，而概率分布学习模型则未能成功。此外，已证实按照分子片段结合规则生成的每个分子在化学上100%有效。为展示在实际问题中的性能，我们还展示了模型在两个实际应用中的良好表现：发现蛋白质对接分子和HIV抑制剂。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日