Agentic reinforcement learning empowers next-generation chemical language models for molecular design and synthesis

Language models are revolutionizing the biochemistry domain, assisting scientists in drug design and chemical synthesis with high efficiency. Yet current approaches struggle between small language models prone to hallucination and limited knowledge retention, and large cloud-based language models plagued by privacy risks and high inference costs. To bridge this gap, we introduce ChemCRAFT, a novel framework leveraging agentic reinforcement learning to decouple chemical reasoning from knowledge storage. Instead of forcing the model to memorize vast chemical data, our approach empowers the language model to interact with a sandbox for precise information retrieval. This externalization of knowledge allows a locally deployable small model to achieve superior performance with minimal inference costs. To enable small language models for agent-calling ability, we build an agentic trajectory construction pipeline and a comprehensive chemical-agent sandbox. Based on sandbox interactions, we constructed ChemToolDataset, the first large-scale chemical tool trajectory dataset. Simultaneously, we propose SMILES-GRPO to build a dense chemical reward function, promoting the model's ability to call chemical agents. Evaluations across diverse aspects of drug design show that ChemCRAFT outperforms current cloud-based LLMs in molecular structure analysis, molecular optimization, and synthesis pathway prediction, demonstrating that scientific reasoning is not solely an emergent ability of model scale, but a learnable policy of tool orchestration. This work establishes a cost-effective and privacy-preserving paradigm for AI-aided chemistry, opening new avenues for accelerating molecular discovery with locally deployable agents. Code available at https://github.com/HowardLi1984/ChemCraft.

翻译：语言模型正在彻底改变生物化学领域，以高效率协助科学家进行药物设计和化学合成。然而，当前方法在易产生幻觉且知识保留有限的小型语言模型，与受隐私风险和高推理成本困扰的大型云端语言模型之间面临困境。为弥合这一鸿沟，我们提出了ChemCRAFT，这是一个利用智能强化学习将化学推理与知识存储解耦的新型框架。我们的方法并非迫使模型记忆海量化学数据，而是赋能语言模型与一个沙盒环境交互，以实现精确的信息检索。这种知识外部化使得一个可本地部署的小型模型能够以最小的推理成本实现卓越性能。为使小型语言模型具备调用智能体的能力，我们构建了一个智能体轨迹构建流水线和一个全面的化学智能体沙盒。基于沙盒交互，我们构建了ChemToolDataset，这是首个大规模的化学工具轨迹数据集。同时，我们提出了SMILES-GRPO来构建一个密集的化学奖励函数，以提升模型调用化学智能体的能力。在药物设计多个方面的评估表明，ChemCRAFT在分子结构分析、分子优化和合成路径预测方面均优于当前基于云端的大型语言模型，这证明科学推理并非仅仅是模型规模涌现的能力，而是一种可学习的工具编排策略。这项工作为人工智能辅助化学建立了一个经济高效且保护隐私的范式，为利用可本地部署的智能体加速分子发现开辟了新途径。代码发布于 https://github.com/HowardLi1984/ChemCraft。