Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, and the potential of LLMs for ocean science is under-explored. The intrinsic reason may be the immense and intricate nature of ocean data as well as the necessity for higher granularity and richness in knowledge. To alleviate these issues, we introduce OceanGPT, the first-ever LLM in the ocean domain, which is expert in various ocean science tasks. We propose DoInstruct, a novel framework to automatically obtain a large volume of ocean domain instruction data, which generates instructions based on multi-agent collaboration. Additionally, we construct the first oceanography benchmark, OceanBench, to evaluate the capabilities of LLMs in the ocean domain. Though comprehensive experiments, OceanGPT not only shows a higher level of knowledge expertise for oceans science tasks but also gains preliminary embodied intelligence capabilities in ocean technology. Codes, data and checkpoints will soon be available at https://github.com/zjunlp/KnowLM.
翻译:海洋科学致力于研究覆盖地球表面积超过70%的海洋——这一蕴藏生命与生物多样性的宝库,具有重大意义。近年来,大语言模型的突破性进展改变了科学研究的范式。尽管大语言模型在其他领域取得了成功,但现有模型往往难以满足海洋学家等领域专家的需求,且大语言模型在海洋科学中的潜力尚未得到充分探索。其根本原因可能在于海量复杂的海洋数据特性,以及知识粒度和丰富度的更高要求。为解决这些问题,我们提出了OceanGPT——首个海洋领域的大语言模型,该模型擅长处理各类海洋科学任务。我们设计了DoInstruct框架,这是一个能够自动获取海量海洋领域指令数据的新颖框架,通过多智能体协作生成指令数据。同时,我们构建了首个海洋学基准测试集OceanBench,用于评估大语言模型在海洋领域的能力。通过全面实验,OceanGPT不仅在海洋科学任务中展现出更高水平的专业知识,还初步具备了海洋技术领域的具身智能能力。相关代码、数据和模型检查点即将在https://github.com/zjunlp/KnowLM 公开。