The promise of autonomous scientific discovery (ASD) hinges not only on answering questions, but also on knowing which questions to ask. Most recent works in ASD explore the use of large language models (LLMs) in goal-driven settings, relying on human-specified research questions to guide hypothesis generation. However, scientific discovery may be accelerated further by allowing the AI system to drive exploration by its own criteria. The few existing approaches in open-ended ASD select hypotheses based on diversity heuristics or subjective proxies for human interestingness, but the former struggles to meaningfully navigate the typically vast hypothesis space, and the latter suffers from imprecise definitions. This paper presents AutoDiscovery -- a method for open-ended ASD that instead drives scientific exploration using Bayesian surprise. Here, we quantify the epistemic shift from the LLM's prior beliefs about a hypothesis to its posterior beliefs after gathering experimental results. To efficiently explore the space of nested hypotheses, our method employs a Monte Carlo tree search (MCTS) strategy with progressive widening using surprisal as the reward function. We evaluate AutoDiscovery in the setting of data-driven discovery across 21 real-world datasets spanning domains such as biology, economics, finance, and behavioral science. Our results demonstrate that under a fixed budget, AutoDiscovery substantially outperforms competitors by producing 5-29% more discoveries deemed surprising by the LLM. Our human evaluation further reveals that two-thirds of discoveries made by our system are surprising to domain experts as well, suggesting this is an important step towards building open-ended ASD systems.
翻译:自主科学发现(ASD)的前景不仅在于回答问题,更在于知道应该提出哪些问题。近期大多数ASD研究探索在目标驱动场景中使用大语言模型(LLM),依赖人工指定的研究问题来指导假设生成。然而,若允许AI系统根据其自身标准驱动探索,科学发现的进程可能进一步加速。现有少数开放式ASD方法基于多样性启发式或对人类兴趣度的主观代理指标来选择假设,但前者难以在通常庞大的假设空间中进行有效导航,后者则因定义模糊而受限。本文提出AutoDiscovery——一种基于贝叶斯惊奇度驱动科学探索的开放式ASD方法。我们通过量化LLM在收集实验结果前后对假设的信念变化(从先验到后验)来度量认知转变。为有效探索嵌套假设空间,我们的方法采用蒙特卡洛树搜索(MCTS)策略,以惊奇度作为奖励函数进行渐进扩展。我们在21个真实世界数据集(涵盖生物学、经济学、金融学和行为科学等领域)的数据驱动发现场景中评估AutoDiscovery。实验结果表明,在固定资源预算下,AutoDiscovery显著优于基线方法,其产生的被LLM判定为“惊奇”的发现数量多出5-29%。人工评估进一步显示,本系统产生的发现中有三分之二同样令领域专家感到惊奇,这表明我们在构建开放式ASD系统方面迈出了重要一步。