Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search

Drug discovery is a time-consuming and expensive process, with traditional high-throughput and docking-based virtual screening hampered by low success rates and limited scalability. Recent advances in generative modelling, including autoregressive, diffusion, and flow-based approaches, have enabled de novo ligand design beyond the limits of enumerative screening. Yet these models often suffer from inadequate generalization, limited interpretability, and an overemphasis on binding affinity at the expense of key pharmacological properties, thereby restricting their translational utility. Here we present Trio, a molecular generation framework integrating fragment-based molecular language modeling, reinforcement learning, and Monte Carlo tree search, for effective and interpretable closed-loop targeted molecular design. Through the three key components, Trio enables context-aware fragment assembly, enforces physicochemical and synthetic feasibility, and guides a balanced search between the exploration of novel chemotypes and the exploitation of promising intermediates within protein binding pockets. Experimental results show that Trio reliably achieves chemically valid and pharmacologically enhanced ligands, outperforming state-of-the-art approaches with improved binding affinity (+7.85%), drug-likeness (+11.10%) and synthetic accessibility (+12.05%), while expanding molecular diversity more than fourfold. By combining generalization, plausibility, and interpretability, Trio establishes a closed-loop generative paradigm that redefines how chemical space can be navigated, offering a transformative foundation for the next era of AI-driven drug discovery.

翻译：药物发现是一个耗时且昂贵的过程，传统的高通量筛选和基于对接的虚拟筛选受限于成功率低和可扩展性有限。近年来，生成建模的进展，包括自回归、扩散和基于流的方法，已能够实现超越枚举筛选限制的从头配体设计。然而，这些模型通常存在泛化能力不足、可解释性有限，以及过度强调结合亲和力而忽视关键药理学性质的问题，从而限制了其转化应用。本文提出Trio，一个集成了基于片段的分子语言建模、强化学习和蒙特卡洛树搜索的分子生成框架，用于实现高效且可解释的闭环靶向分子设计。通过这三个关键组件，Trio能够实现上下文感知的片段组装，强制执行物理化学和合成可行性，并在探索新型化学类型与利用蛋白质结合口袋内有前景的中间体之间引导平衡搜索。实验结果表明，Trio能够可靠地生成化学有效且药理学增强的配体，在结合亲和力（+7.85%）、类药性（+11.10%）和合成可及性（+12.05%）方面优于现有先进方法，同时将分子多样性扩展了四倍以上。通过结合泛化性、合理性和可解释性，Trio建立了一种闭环生成范式，重新定义了化学空间的导航方式，为人工智能驱动的药物发现新时代提供了变革性基础。