Template-free retrosynthesis methods treat the task as black-box sequence generation, limiting learning efficiency, while semi-template approaches rely on rigid reaction libraries that constrain generalization. We address this gap with a key insight: atom ordering in neural representations matters. Building on this insight, we propose a structure-aware template-free framework that encodes the two-stage nature of chemical reactions as a positional inductive bias. By placing reaction center atoms at the sequence head, our method transforms implicit chemical knowledge into explicit positional patterns that the model can readily capture. The proposed RetroDiT backbone, a graph transformer with rotary position embeddings, exploits this ordering to prioritize chemically critical regions. Combined with discrete flow matching, our approach decouples training from sampling and enables generation in 20--50 steps versus 500 for prior diffusion methods. Our method achieves state-of-the-art performance on both USPTO-50k (61.2% top-1) and the large-scale USPTO-Full (51.3% top-1) with predicted reaction centers. With oracle centers, performance reaches 71.1% and 63.4% respectively, surpassing foundation models trained on 10 billion reactions while using orders of magnitude less data. Ablation studies further reveal that structural priors outperform brute-force scaling: a 280K-parameter model with proper ordering matches a 65M-parameter model without it.
翻译:无模板逆合成方法将任务视为黑盒序列生成,限制了学习效率,而半模板方法依赖僵化的反应库,约束了泛化能力。我们通过一个关键洞见来弥合这一差距:神经表示中的原子排序至关重要。基于此洞见,我们提出了一种结构感知的无模板框架,将化学反应的两阶段特性编码为位置归纳偏置。通过将反应中心原子置于序列头部,我们的方法将隐式化学知识转化为模型易于捕捉的显式位置模式。所提出的RetroDiT主干网络(一种带有旋转位置嵌入的图Transformer)利用这种排序来优先处理化学关键区域。结合离散流匹配,我们的方法将训练与采样解耦,并能在20-50步内完成生成,而先前的扩散方法需要500步。在使用预测反应中心的情况下,我们的方法在USPTO-50k(61.2% top-1)和大规模USPTO-Full(51.3% top-1)上均实现了最先进的性能。在使用真实反应中心时,性能分别达到71.1%和63.4%,超越了在100亿反应上训练的基础模型,同时使用的数据量少数个数量级。消融研究进一步揭示,结构先验优于暴力缩放:一个具有适当排序的28万参数模型,其性能可与一个不具备该排序的6500万参数模型相媲美。