Chemical reactions are the fundamental building blocks of drug design and organic chemistry research. In recent years, there has been a growing need for a large-scale deep-learning framework that can efficiently capture the basic rules of chemical reactions. In this paper, we have proposed a unified framework that addresses both the reaction representation learning and molecule generation tasks, which allows for a more holistic approach. Inspired by the organic chemistry mechanism, we develop a novel pretraining framework that enables us to incorporate inductive biases into the model. Our framework achieves state-of-the-art results on challenging downstream tasks. By possessing chemical knowledge, this framework can be applied to reaction-based generative models, overcoming the limitations of current molecule generation models that rely on a small number of reaction templates. In the extensive experiments, our model generates synthesizable drug-like structures of high quality. Overall, our work presents a significant step toward a large-scale deep-learning framework for a variety of reaction-based applications.
翻译:化学反应是药物设计和有机化学研究的基石。近年来,对能够高效捕获化学反应基本规则的大规模深度学习框架的需求日益增长。本文提出了一个统一框架,同时解决反应表示学习和分子生成任务,从而实现更全面的处理方法。受有机化学机理启发,我们开发了一个新颖的预训练框架,能够将归纳偏置融入模型中。我们的框架在具有挑战性的下游任务上取得了最优结果。通过掌握化学知识,该框架可应用于基于反应的生成模型,从而克服当前分子生成模型依赖少量反应模板的局限性。在大量实验中,我们的模型生成了高质量、可合成的类药结构。总体而言,我们的工作为构建面向多种反应应用的大规模深度学习框架迈出了重要一步。