Climate data science remains constrained by fragmented data sources, heterogeneous formats, and steep technical expertise requirements. These barriers slow discovery, limit participation, and undermine reproducibility. We present AutoClimDS, a Minimum Viable Product (MVP) Agentic AI system that addresses these challenges by integrating a curated climate knowledge graph (KG) with a set of Agentic AI workflows designed for cloud-native scientific analysis. The KG unifies datasets, metadata, tools, and workflows into a machine-interpretable structure, while AI agents, powered by generative models, enable natural-language query interpretation, automated data discovery, programmatic data acquisition, and end-to-end climate analysis. A key result is that AutoClimDS can reproduce published scientific figures and analyses from natural-language instructions alone, completing the entire workflow from dataset selection to preprocessing to modeling. When given the same tasks, state-of-the-art general-purpose LLMs (e.g., ChatGPT GPT-5.1) cannot independently identify authoritative datasets or construct valid retrieval workflows using standard web access. This highlights the necessity of structured scientific memory for agentic scientific reasoning. By encoding procedural workflow knowledge into a KG and integrating it with existing technologies (cloud APIs, LLMs, sandboxed execution), AutoClimDS demonstrates that the KG serves as the essential enabling component, the irreplaceable structural foundation, for autonomous climate data science. This approach provides a pathway toward democratizing climate research through human-AI collaboration.
翻译:气候数据科学仍受限于碎片化的数据源、异构的数据格式以及陡峭的技术专长要求。这些障碍减缓了科学发现,限制了参与度,并损害了可复现性。我们提出了AutoClimDS,一个最小可行产品(MVP)智能体人工智能系统,它通过将精心构建的气候知识图谱(KG)与一套为云原生科学分析设计的智能体人工智能工作流相结合,以应对这些挑战。该知识图谱将数据集、元数据、工具和工作流统一到一个机器可解释的结构中,而由生成模型驱动的AI智能体则能够实现自然语言查询解释、自动化数据发现、程序化数据获取以及端到端的气候分析。一个关键成果是,AutoClimDS能够仅根据自然语言指令复现已发表的科学图表和分析,完成从数据集选择到预处理再到建模的整个工作流程。当执行相同任务时,最先进的通用大语言模型(例如ChatGPT GPT-5.1)无法独立识别权威数据集或使用标准网络访问构建有效的数据检索工作流。这凸显了结构化科学记忆对于智能体科学推理的必要性。通过将程序化工作流知识编码到知识图谱中,并将其与现有技术(云API、大语言模型、沙箱执行环境)集成,AutoClimDS证明了知识图谱是自主气候数据科学不可或缺的赋能组件和不可替代的结构基础。该方法为通过人机协作实现气候研究的民主化提供了一条可行路径。