Large language models (LLMs) are increasingly used in scientific research and discovery, supporting tasks ranging from literature retrieval and synthesis to hypothesis generation, autonomous experimentation, and research evaluation. Existing surveys often conflate scientific research with scientific discovery and typically organize systems by domain, task, or autonomy level alone. In this survey, we propose a four-role framework for understanding LLMs in scientific innovation: Assistant, Collaborator, Scientist, and Evaluator. The framework integrates three complementary dimensions: autonomy level, cognitive function, and scientific innovation, to distinguish research-oriented support from frontier-oriented discovery. We review representative methods, benchmarks, and evaluation practices for each role, examining their capabilities, limitations, and human oversight requirements. Across the literature, Assistant systems are comparatively mature in retrieval and synthesis but remain unreliable in open-ended applications; Collaborator systems expand the space of candidate hypotheses yet struggle with novelty-grounding trade-offs; Scientist systems increasingly automate research workflows but face reliability and safety bottlenecks; and Evaluator systems support review and verification while remaining weak in novelty assessment. We argue that progress in AI for science depends not only on model capability, but also on evaluation, oversight, accountability, and institutional integration.
翻译:大语言模型(LLMs)正日益被应用于科学研究和发现中,支持从文献检索与综合到假设生成、自主实验及研究评估等各项任务。现有综述常将科学研究与科学发现混为一谈,通常仅依据领域、任务或自主性水平对系统进行分类。在本综述中,我们提出了一个理解LLMs在科学创新中作用的四角色框架:助手、协作者、科学家和评估者。该框架整合了三个互补维度:自主性水平、认知功能和科学创新,以区分面向研究的支持与面向前沿的发现。我们回顾了每个角色的代表性方法、基准和评估实践,分析了它们的能力、局限性以及对人类监督的需求。纵观现有文献,助手系统在检索和综合方面相对成熟,但在开放式应用中仍不可靠;协作者系统拓展了候选假设的空间,却难以权衡新颖性与基于先验知识的可靠性;科学家系统越来越多地自动化研究工作流,但面临可靠性与安全性的瓶颈;评估者系统支持评审与验证,但在新颖性评估方面仍然薄弱。我们认为,人工智能促进科学的进步不仅依赖于模型能力,还取决于评估、监督、问责机制与制度整合。