Designing molecules that are both property-optimal and readily synthesizable is a central challenge in drug discovery. Existing works that do consider synthesizability can jointly output predicted synthesis routes for generated molecules. However, there has been minimal attention in addressing the ease of synthesis and with flexibility to incorporate desired reaction constraints. On the other hand, virtual screening searches for commercially available compounds, but imposes challenges when scaling to ultra-large (billion-size and beyond) chemical spaces. Here, we propose a generative design framework that unifies synthesis-constrained molecular design and ultra-large-scale virtual screening through steerable and granular synthesizability control. Generated molecules satisfy arbitrary multi-parameter optimization objectives with predicted synthesis routes satisfying mix-and-match constraints: including or avoiding certain reactions, incorporating specific building blocks, and minimizing synthesis route length. In an end-to-end in-house campaign targeting BRD4, we designed molecules synthesizable with specific selected reactions and building blocks, synthesized all six selected compounds, and identified two micromolar binders. We further demonstrate that reaction control enables efficient navigation of ultra-large make-on-demand chemical spaces to identify property-optimal candidates. By applying our framework to Chemspace's Freedom 4.0 make-on-demand space (142 billion molecules), we generated ~320k molecules (0.00023% of the library) on a single consumer-grade GPU (with only 8 GB GPU memory) and identified a micromolar Wee1 binder amongst 60 synthesized candidates. The single unified framework thus enables generating novel synthesizable molecules and retrieving catalogue-ready candidates, offering a flexible solution to mitigating the synthesizability bottleneck.
翻译:设计兼具最优性质与易合成性的分子是药物发现中的核心挑战。现有考虑可合成性的研究虽能联合输出生成分子的预测合成路线,但在处理合成难易度及灵活融入所需反应约束方面关注甚少。另一方面,虚拟筛选可搜索市售化合物,但在扩展至超大规模(数十亿级及以上)化学空间时面临挑战。本文提出一种统一合成约束分子设计与超大规模虚拟筛选的生成式设计框架,通过可操控且粒度可控的可合成性控制实现。生成的分子可满足任意多参数优化目标,其预测合成路线能适配混合匹配约束:包含或避免特定反应、整合特定构建模块、最小化合成路线长度。在针对BRD4的终端到终端内部活动中,我们设计了可通过特定选定反应与构建模块合成的分子,合成了全部六种候选化合物,并识别出两种微摩尔级结合物。我们进一步证明,反应控制能有效导航超大规模按需制造化学空间以识别最优性质候选物。将本框架应用于Chemspace的Freedom 4.0按需制造空间(1420亿分子),我们仅用单块消费级GPU(8 GB显存)即生成约32万分子(占库容量的0.00023%),并在60个合成候选物中识别出一种微摩尔级Wee1结合物。这一统一框架既能生成新型可合成分子,又可检索目录候选物,为缓解可合成性瓶颈提供了灵活解决方案。