Single-step retrosynthesis is a crucial task in organic chemistry and drug design, requiring the identification of required reactants to synthesize a specific compound. with the advent of computer-aided synthesis planning, there is growing interest in using machine-learning techniques to facilitate the process. Existing template-free machine learning-based models typically utilize transformer structures and represent molecules as ID sequences. However, these methods often face challenges in fully leveraging the extensive topological information of the molecule and aligning atoms between the production and reactants, leading to results that are not as competitive as those of semi-template models. Our proposed method, Node-Aligned Graph-to-Graph (NAG2G), also serves as a transformer-based template-free model but utilizes 2D molecular graphs and 3D conformation information. Furthermore, our approach simplifies the incorporation of production-reactant atom mapping alignment by leveraging node alignment to determine a specific order for node generation and generating molecular graphs in an auto-regressive manner node-by-node. This method ensures that the node generation order coincides with the node order in the input graph, overcoming the difficulty of determining a specific node generation order in an auto-regressive manner. Our extensive benchmarking results demonstrate that the proposed NAG2G can outperform the previous state-of-the-art baselines in various metrics.
翻译:单步逆合成是有机化学和药物设计中的关键任务,需要识别合成特定化合物所需的反应物。随着计算机辅助合成规划的发展,利用机器学习技术促进该过程的研究兴趣日益增长。现有的无模板机器学习模型通常采用Transformer结构,将分子表示为ID序列。然而,这些方法常面临无法充分利用分子丰富的拓扑信息以及对齐产物与反应物中原子的挑战,导致其性能不如半模板模型。我们提出的方法——节点对齐图到图生成(NAG2G)——同样基于Transformer的无模板模型,但利用二维分子图和三维构象信息。此外,我们的方法通过利用节点对齐确定节点生成的具体顺序,并以自回归方式逐个节点生成分子图,从而简化了产物-反应物原子映射对齐的整合过程。该方法确保节点生成顺序与输入图中的节点顺序一致,克服了自回归方式中确定特定节点生成顺序的难题。广泛的基准测试结果表明,我们提出的NAG2G在各项指标上均能超越先前的先进基线模型。