Large language models (LLMs) have demonstrated a powerful ability to answer various queries as a general-purpose assistant. The continuous multi-modal large language models (MLLM) empower LLMs with the ability to perceive visual signals. The launch of GPT-4 (Generative Pre-trained Transformers) has generated significant interest in the research communities. GPT-4V(ison) has demonstrated significant power in both academia and industry fields, as a focal point in a new artificial intelligence generation. Though significant success was achieved by GPT-4V, exploring MLLMs in domain-specific analysis (e.g., marine analysis) that required domain-specific knowledge and expertise has gained less attention. In this study, we carry out the preliminary and comprehensive case study of utilizing GPT-4V for marine analysis. This report conducts a systematic evaluation of existing GPT-4V, assessing the performance of GPT-4V on marine research and also setting a new standard for future developments in MLLMs. The experimental results of GPT-4V show that the responses generated by GPT-4V are still far away from satisfying the domain-specific requirements of the marine professions. All images and prompts used in this study will be available at https://github.com/hkust-vgd/Marine_GPT-4V_Eval
翻译:大语言模型(LLMs)已展现出作为通用助手回答各类查询的强大能力。持续发展的多模态大语言模型(MLLM)赋予了LLMs感知视觉信号的能力。GPT-4(生成式预训练Transformer)的发布在研究界引起了广泛关注。GPT-4V(视觉版)在学术界和工业领域均展现出显著实力,成为新一代人工智能的焦点。尽管GPT-4V取得了重大成功,但探索MLLM在需要领域知识与专业技能的特定领域分析(如海洋分析)中的应用却较少受到关注。在本研究中,我们针对利用GPT-4V进行海洋分析开展了初步且全面的案例研究。本报告对现有GPT-4V进行了系统评估,衡量了GPT-4V在海洋研究中的表现,并为未来MLLM的发展设立了新标准。GPT-4V的实验结果表明,其生成的回答仍远未满足海洋专业领域的特定需求。本研究使用的所有图像和提示词将公布于https://github.com/hkust-vgd/Marine_GPT-4V_Eval