Over the last decades, excellent computational chemistry tools have been developed. Integrating them into a single platform with enhanced accessibility could help reaching their full potential by overcoming steep learning curves. Recently, large-language models (LLMs) have shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 18 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our agent autonomously planned and executed the syntheses of an insect repellent, three organocatalysts, and guided the discovery of a novel chromophore. Our evaluation, including both LLM and expert assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and Chemcrow's performance. Our work not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.
翻译:过去几十年间,优秀的计算化学工具层出不穷。将它们整合到单一平台并提升可及性,有助于克服陡峭的学习曲线,充分发挥其潜力。近期,大型语言模型(LLM)在跨领域任务中展现出强大性能,但在处理化学相关问题时仍存在困难。此外,这些模型缺乏访问外部知识库的途径,限制了其在科学应用中的实用性。本研究提出ChemCrow——一种专为有机合成、药物发现和材料设计任务设计的LLM化学智能体。通过整合18个专家设计的工具,ChemCrow增强了LLM在化学领域的表现,并催生了新能力。该智能体自主规划并执行了驱虫剂、三种有机催化剂的合成,还引导发现了新型发色团。我们的评估(包含LLM评估与专家评估)表明,ChemCrow在自动化多种化学任务方面具有高效性。令人惊讶的是,我们发现以GPT-4作为评估器时,其无法区分明显错误的GPT-4输出与ChemCrow的表现。本研究不仅能为化学专家提供辅助、降低非专业人士的使用门槛,更通过弥合实验化学与计算化学之间的鸿沟,推动科学进步。