MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses

Scientific discovery contributes largely to human society's prosperity, and recent progress shows that LLMs could potentially catalyze this process. However, it is still unclear whether LLMs can discover novel and valid hypotheses in chemistry. In this work, we investigate this central research question: Can LLMs automatically discover novel and valid chemistry research hypotheses given only a chemistry research background (consisting of a research question and/or a background survey), without limitation on the domain of the research question? After extensive discussions with chemistry experts, we propose an assumption that a majority of chemistry hypotheses can be resulted from a research background and several inspirations. With this key insight, we break the central question into three smaller fundamental questions. In brief, they are: (1) given a background question, whether LLMs can retrieve good inspirations; (2) with background and inspirations, whether LLMs can lead to hypothesis; and (3) whether LLMs can identify good hypotheses to rank them higher. To investigate these questions, we construct a benchmark consisting of 51 chemistry papers published in Nature, Science, or a similar level in 2024 (all papers are only available online since 2024). Every paper is divided by chemistry PhD students into three components: background, inspirations, and hypothesis. The goal is to rediscover the hypothesis, given only the background and a large randomly selected chemistry literature corpus consisting the ground truth inspiration papers, with LLMs trained with data up to 2023. We also develop an LLM-based multi-agent framework that leverages the assumption, consisting of three stages reflecting the three smaller questions. The proposed method can rediscover many hypotheses with very high similarity with the ground truth ones, covering the main innovations.

翻译：科学发现极大地促进了人类社会的繁荣，近期进展表明大语言模型（LLMs）可能催化这一进程。然而，LLMs能否在化学领域发现新颖且有效的假说仍不明确。本研究探讨这一核心研究问题：在仅给定化学研究背景（包含研究问题和/或背景综述）且对研究问题领域无限制的情况下，LLMs能否自动发现新颖且有效的化学研究假说？在与化学专家深入讨论后，我们提出一个假设：多数化学假说可由研究背景及若干灵感来源推导得出。基于这一关键洞见，我们将核心问题分解为三个更基础的问题。简言之，它们是：（1）给定背景问题，LLMs能否检索到优质灵感来源；（2）基于背景与灵感，LLMs能否推导出假说；（3）LLMs能否识别优质假说并将其排序靠前。为探究这些问题，我们构建了一个包含51篇发表于《自然》《科学》或同级期刊的化学论文（所有论文均于2024年起在线发布）的基准数据集。每篇论文由化学专业博士生拆分为三个组成部分：背景、灵感与假说。研究目标是在仅提供背景信息及包含真实灵感文献的大型随机化学文献语料库（LLMs训练数据截止至2023年）的条件下，利用LLMs重新发现原始假说。我们还开发了一个基于LLM的多智能体框架，该框架依托前述假设构建，包含对应三个子问题的三个阶段。所提方法能够重新发现与真实假说高度相似、覆盖核心创新点的多种假说。