Retrosynthesis, the process of breaking down a target molecule into simpler precursors through a series of valid reactions, stands at the core of organic chemistry and drug development. Although recent machine learning (ML) research has advanced single-step retrosynthetic modeling and subsequent route searches, these solutions remain restricted by the extensive combinatorial space of possible pathways. Concurrently, large language models (LLMs) have exhibited remarkable chemical knowledge, hinting at their potential to tackle complex decision-making tasks in chemistry. In this work, we explore whether LLMs can successfully navigate the highly constrained, multi-step retrosynthesis planning problem. We introduce an efficient scheme for encoding reaction pathways and present a new route-level search strategy, moving beyond the conventional step-by-step reactant prediction. Through comprehensive evaluations, we show that our LLM-augmented approach excels at retrosynthesis planning and extends naturally to the broader challenge of synthesizable molecular design.
翻译:逆合成分析,即通过一系列有效反应将目标分子逐步分解为更简单前体的过程,是有机化学和药物开发的核心。尽管近期机器学习(ML)研究推动了单步逆合成建模及后续路径搜索的进步,但这些方案仍受限于可能路径的庞大组合空间。与此同时,大型语言模型(LLMs)已展现出卓越的化学知识,暗示其在应对化学领域复杂决策任务中的潜力。在本工作中,我们探究了LLMs能否成功解决高度约束的多步逆合成规划问题。我们提出了一种高效的反应路径编码方案,并引入了一种超越传统逐步骤反应物预测的全新路线级搜索策略。通过全面评估,我们证明所提出的LLM增强方法在逆合成规划中表现优异,并能自然延伸至可合成分子设计这一更广泛的挑战。