The emerging paradigm of AI co-scientists focuses on tasks characterized by repeatable verification, where agents explore search spaces in 'guess and check' loops. This paradigm does not extend to problems where repeated evaluation is impossible and ground truth is established by the consensus synthesis of theory and existing evidence. We evaluate a Gemini-based AI environment designed to support collaborative scientific assessment, integrated into a standard scientific workflow. In collaboration with a diverse group of 13 scientists working in the field of climate science, we tested the system on a complex topic: the stability of the Atlantic Meridional Overturning Circulation (AMOC). Our results show that AI can accelerate the scientific workflow. The group produced a comprehensive synthesis of 79 papers through 104 revision cycles in just over 46 person-hours. AI contribution was significant: most AI-generated content was retained in the report. AI also helped maintain logical consistency and presentation quality. However, expert additions were crucial to ensure its acceptability: less than half of the report was produced by AI. Furthermore, substantial oversight was required to expand and elevate the content to rigorous scientific standards.
翻译:新兴的人工智能协科学家范式聚焦于具有可重复验证特征的任务,其中智能体在“猜测与验证”循环中探索搜索空间。该范式并不适用于无法重复评估、且需通过理论与现有证据的共识综合来确立基本事实的问题。我们评估了一个基于Gemini的人工智能环境,该系统旨在支持协作式科学评估,并已整合至标准科学工作流程中。通过与气候科学领域13位不同背景的科学家合作,我们在一个复杂主题——大西洋经向翻转环流(AMOC)的稳定性——上对该系统进行了测试。研究结果表明,人工智能能够加速科学工作流程:该团队仅用46余人时,通过104次修订循环完成了对79篇论文的全面综述。人工智能的贡献显著:报告大部分AI生成内容得以保留;AI还有助于保持逻辑一致性与呈现质量。然而,专家补充对确保报告可接受性至关重要:报告中不足半数内容由AI生成。此外,需要大量监督工作才能使内容拓展并提升至严谨的科学标准。