Retrosynthesis planning is a fundamental challenge in chemistry which aims at designing reaction pathways from commercially available starting materials to a target molecule. Each step in multi-step retrosynthesis planning requires accurate prediction of possible precursor molecules given the target molecule and confidence estimates to guide heuristic search algorithms. We model single-step retrosynthesis planning as a distribution learning problem in a discrete state space. First, we introduce the Markov Bridge Model, a generative framework aimed to approximate the dependency between two intractable discrete distributions accessible via a finite sample of coupled data points. Our framework is based on the concept of a Markov bridge, a Markov process pinned at its endpoints. Unlike diffusion-based methods, our Markov Bridge Model does not need a tractable noise distribution as a sampling proxy and directly operates on the input product molecules as samples from the intractable prior distribution. We then address the retrosynthesis planning problem with our novel framework and introduce RetroBridge, a template-free retrosynthesis modeling approach that achieves state-of-the-art results on standard evaluation benchmarks.
翻译:逆合成规划是化学领域的一项基础性挑战,旨在设计从市售起始原料到目标分子的反应路径。在多步逆合成规划的每一步中,需要根据目标分子准确预测可能的中间体分子,并给出置信度估计以指导启发式搜索算法。我们将单步逆合成规划建模为离散状态空间中的分布学习问题。首先,我们提出马尔可夫桥模型(Markov Bridge Model)这一生成框架,旨在近似两个难以处理的离散分布之间的依赖关系,这两个分布可通过耦合数据点的有限样本进行访问。该框架基于马尔可夫桥(即两端固定的马尔可夫过程)的概念。与基于扩散的方法不同,马尔可夫桥模型无需将可处理的噪声分布作为采样代理,而是直接以输入的产物分子作为来自难处理先验分布的样本进行操作。随后,我们利用这一新框架解决逆合成规划问题,并引入RetroBridge——一种无模板的逆合成建模方法,该方法在标准评估基准上取得了最先进的结果。