Large-language models (LLMs) have recently shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 13 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and GPT-4 + ChemCrow performance. There is a significant risk of misuse of tools like ChemCrow and we discuss their potential harms. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.
翻译:大型语言模型(LLMs)近期在跨领域任务中展现出强劲性能,但在处理化学相关问题方面仍存在困难。此外,这类模型缺乏访问外部知识源的途径,限制了其在科学应用中的实用性。本研究提出ChemCrow——一种专为有机合成、药物发现及材料设计领域任务设计的LLM化学智能体。通过整合13个专家设计的工具,ChemCrow显著增强了LLM在化学领域的表现,并催生了新的能力。我们的评估(涵盖LLM自动评估与人类专家评估)证实了ChemCrow在自动化执行多样化化学任务方面的有效性。值得注意的是,我们发现GPT-4作为评估者无法区分明显错误的GPT-4完成结果与GPT-4+ChemCrow的生成结果。类似ChemCrow的工具存在显著的滥用风险,本文对其潜在危害进行了讨论。若能被负责任地使用,ChemCrow不仅能为化学专家提供辅助、降低非专业人员的入门门槛,更能通过弥合实验化学与计算化学之间的鸿沟推动科学进步。