Chemical reactions are the fundamental building blocks of drug design and organic chemistry research. Machine learning for chemistry is a rapidly advancing field with numerous applications. In recent years, there has been a growing need for a large-scale deep-learning framework that can efficiently capture the basic rules of chemical reactions. In this paper, we have proposed a unified framework that addresses both the reaction representation learning and molecule generation tasks, which allows for a more holistic approach. Inspired by the organic chemistry mechanism, we develop a novel pretraining framework that enables us to incorporate inductive biases into the model. Our framework achieves state-of-the-art results on challenging downstream tasks. By possessing chemical knowledge, this framework can be applied to reaction-based generative models, overcoming the limitations of current molecule generation models that rely on a small number of reaction templates. In the extensive experiments, our model generates synthesizable drug-like structures of high quality. Overall, our work presents a significant step toward a large-scale deep-learning framework for a variety of reaction-based applications.
翻译:化学反应是药物设计和有机化学研究的基本构建模块。化学领域的机器学习是一个快速发展的领域,具有众多应用。近年来,人们日益需要一种能够高效捕获化学反应基本规则的大规模深度学习框架。在本文中,我们提出了一个统一框架,同时处理反应表示学习和分子生成任务,从而能够采用更全面的方法。受有机化学机理启发,我们开发了一种新颖的预训练框架,能够将归纳偏置融入模型中。我们的框架在具有挑战性的下游任务上取得了最先进的结果。通过掌握化学知识,该框架可应用于基于反应的生成模型,克服了当前依赖少量反应模板的分子生成模型的局限性。在大量实验中,我们的模型生成了高质量、可合成且类似药物的结构。总体而言,我们的工作朝着构建面向多种基于反应的应用的大规模深度学习框架迈出了重要一步。