Transition states (TSs) are central to understanding and quantitatively predicting chemical reactivity and reaction mechanisms. Although traditional TS generation methods are computationally expensive, recent generative modeling approaches have enabled chemically meaningful TS prediction for relatively small molecules. However, these methods fail to generalize to practically relevant reaction substrates because of distribution shifts induced by increasing molecular sizes. Furthermore, TS geometries for larger molecules are not available at scale, making it infeasible to train generative models from scratch on such molecules. To address these challenges, we introduce FragmentFlow: a divide-and-conquer approach that trains a generative model to predict TS geometries for the reactive core atoms, which define the reaction mechanism. The full TS structure is then reconstructed by re-attaching substituent fragments to the predicted core. By operating on reactive cores, whose size and composition remain relatively invariant across molecular contexts, FragmentFlow mitigates distribution shifts in generative modeling. Evaluated on a new curated dataset of reactions involving reactants with up to 33 heavy atoms, FragmentFlow correctly identifies 90% of TSs while requiring 30% fewer saddle-point optimization steps than classical initialization schemes. These results point toward scalable TS generation for high-throughput reactivity studies.
翻译:过渡态(TSs)是理解和定量预测化学反应性及反应机理的核心。尽管传统的过渡态生成方法计算成本高昂,但近期的生成建模方法已能对相对较小的分子进行具有化学意义的过渡态预测。然而,由于分子尺寸增大引起的分布偏移,这些方法难以推广到具有实际意义的反应底物。此外,大分子的过渡态几何结构无法大规模获取,这使得从头开始在此类分子上训练生成模型变得不可行。为应对这些挑战,我们提出了FragmentFlow:一种分而治之的方法,该方法训练一个生成模型来预测定义反应机理的反应核心原子的过渡态几何结构。随后,通过将取代基片段重新连接到预测的核心上,重建完整的过渡态结构。通过作用于反应核心——其尺寸和组成在不同分子环境中保持相对不变——FragmentFlow缓解了生成建模中的分布偏移问题。在一个新构建的、涉及多达33个重原子的反应物反应数据集上进行评估,FragmentFlow正确识别了90%的过渡态,同时比经典的初始化方案减少了30%的鞍点优化步骤。这些结果表明了面向高通量反应性研究的可扩展过渡态生成的前景。