Constrained synthesizability is an unaddressed challenge in generative molecular design. In particular, designing molecules satisfying multi-parameter optimization objectives, while simultaneously being synthesizable and enforcing the presence of specific commercial building blocks in the synthesis. This is practically important for molecule re-purposing, sustainability, and efficiency. In this work, we propose a novel reward function called TANimoto Group Overlap (TANGO), which uses chemistry principles to transform a sparse reward function into a dense and learnable reward function -- crucial for reinforcement learning. TANGO can augment general-purpose molecular generative models to directly optimize for constrained synthesizability while simultaneously optimizing for other properties relevant to drug discovery using reinforcement learning. Our framework is general and addresses starting-material, intermediate, and divergent synthesis constraints. Contrary to most existing works in the field, we show that incentivizing a general-purpose (without any inductive biases) model is a productive approach to navigating challenging optimization scenarios. We demonstrate this by showing that the trained models explicitly learn a desirable distribution. Our framework is the first generative approach to tackle constrained synthesizability.
翻译:约束可合成性是生成式分子设计中尚未解决的挑战。具体而言,设计分子需满足多参数优化目标,同时兼具可合成性,并确保合成路径中包含特定商业构建模块。这对于分子再利用、可持续性和效率具有重要实践意义。本研究提出一种名为TANimoto基团重叠度(TANGO)的新型奖励函数,该函数运用化学原理将稀疏奖励函数转化为密集且可学习的奖励函数——这对强化学习至关重要。TANGO能够增强通用分子生成模型,使其通过强化学习直接优化约束可合成性,同时优化药物发现相关的其他性质。我们的框架具有普适性,可处理起始原料、中间体及发散式合成约束。与领域内多数现有研究不同,我们证明激励通用模型(不含任何归纳偏置)是应对复杂优化场景的有效途径。通过展示训练模型显式学习理想分布,我们验证了这一观点。本框架是首个解决约束可合成性问题的生成式方法。