Transition states (TSs) are central to understanding and quantitatively predicting chemical reactivity and reaction mechanisms. Although traditional TS generation methods are computationally expensive, recent generative modeling approaches have enabled chemically meaningful TS prediction for relatively small molecules. However, these methods fail to generalize to practically relevant reaction substrates because of distribution shifts induced by increasing molecular sizes. Furthermore, TS geometries for larger molecules are not available at scale, making it infeasible to train generative models from scratch on such molecules. To address these challenges, we introduce FragmentFlow: a divide-and-conquer approach that trains a generative model to predict TS geometries for the reactive core atoms, which define the reaction mechanism. The full TS structure is then reconstructed by re-attaching substituent fragments to the predicted core. By operating on reactive cores, whose size and composition remain relatively invariant across molecular contexts, FragmentFlow mitigates distribution shifts in generative modeling. Evaluated on a new curated dataset of reactions involving reactants with up to 33 heavy atoms, FragmentFlow correctly identifies 90% of TSs while requiring 30% fewer saddle-point optimization steps than classical initialization schemes. These results point toward scalable TS generation for high-throughput reactivity studies.
翻译:过渡态(TSs)是理解和定量预测化学反应性及反应机理的核心。尽管传统的过渡态生成方法计算成本高昂,但近期的生成建模方法已能对相对较小的分子进行具有化学意义的过渡态预测。然而,由于分子尺寸增大引起的分布偏移,这些方法难以推广到具有实际应用价值的反应底物。此外,大分子的过渡态几何结构无法大规模获取,这使得从头开始在此类分子上训练生成模型变得不可行。为应对这些挑战,我们提出了FragmentFlow:一种分而治之的方法,该方法训练一个生成模型来预测定义反应机理的反应核心原子的过渡态几何结构。然后,通过将取代基片段重新连接到预测的核心上,重构出完整的过渡态结构。通过对反应核心(其尺寸和组成在不同分子环境中保持相对不变)进行操作,FragmentFlow缓解了生成建模中的分布偏移问题。在一个新构建的、涉及重原子数多达33个的反应物反应数据集上的评估表明,FragmentFlow能正确识别90%的过渡态,同时比经典初始化方案所需的鞍点优化步骤减少30%。这些结果为高通量反应性研究中的可扩展过渡态生成指明了方向。