MarineGPT: Unlocking Secrets of Ocean to the Public

Large language models (LLMs), such as ChatGPT/GPT-4, have proven to be powerful tools in promoting the user experience as an AI assistant. The continuous works are proposing multi-modal large language models (MLLM), empowering LLMs with the ability to sense multiple modality inputs through constructing a joint semantic space (e.g. visual-text space). Though significant success was achieved in LLMs and MLLMs, exploring LLMs and MLLMs in domain-specific applications that required domain-specific knowledge and expertise has been less conducted, especially for \textbf{marine domain}. Different from general-purpose MLLMs, the marine-specific MLLM is required to yield much more \textbf{sensitive}, \textbf{informative}, and \textbf{scientific} responses. In this work, we demonstrate that the existing MLLMs optimized on huge amounts of readily available general-purpose training data show a minimal ability to understand domain-specific intents and then generate informative and satisfactory responses. To address these issues, we propose \textbf{MarineGPT}, the first vision-language model specially designed for the marine domain, unlocking the secrets of the ocean to the public. We present our \textbf{Marine-5M} dataset with more than 5 million marine image-text pairs to inject domain-specific marine knowledge into our model and achieve better marine vision and language alignment. Our MarineGPT not only pushes the boundaries of marine understanding to the general public but also offers a standard protocol for adapting a general-purpose assistant to downstream domain-specific experts. We pave the way for a wide range of marine applications while setting valuable data and pre-trained models for future research in both academic and industrial communities.

翻译：大型语言模型（LLMs），如ChatGPT/GPT-4，已被证明是提升AI助手用户体验的有力工具。持续的研究工作正在提出多模态大型语言模型（MLLMs），通过构建联合语义空间（例如视觉-文本空间），赋予LLMs感知多模态输入的能力。尽管LLMs和MLLMs取得了显著成功，但在需要领域特定知识与专业技能的专用领域应用中对LLMs和MLLMs的探索仍不足，尤其是在**海洋领域**。与通用型MLLMs不同，海洋专用MLLM需要生成更为**敏感**、**信息丰富**且**科学准确**的响应。本研究表明，现有基于海量通用训练数据优化的MLLMs在理解领域特定意图并生成信息丰富且令人满意的响应方面能力有限。为解决这些问题，我们提出**MarineGPT**——首个专为海洋领域设计的视觉-语言模型，旨在向公众揭示海洋的奥秘。我们构建了包含超过500万海洋图像-文本对的**Marine-5M**数据集，将领域特定的海洋知识注入模型，实现更优的海洋视觉与语言对齐。我们的MarineGPT不仅将海洋理解的边界拓展至公众，还提供了一套将通用型AI助手适配至下游领域专用专家的标准方案。该工作为广泛的海洋应用开辟了道路，同时为学术界和工业界的未来研究提供了宝贵的数据与预训练模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日