Large Vision-Language Models (LVLMs) have demonstrated outstanding performance in various general multimodal applications such as image recognition and visual reasoning, and have also shown promising potential in specialized domains. However, the application potential of LVLMs in the insurance domain-characterized by rich application scenarios and abundant multimodal data-has not been effectively explored. There is no systematic review of multimodal tasks in the insurance domain, nor a benchmark specifically designed to evaluate the capabilities of LVLMs in insurance. This gap hinders the development of LVLMs within the insurance domain. In this paper, we systematically review and distill multimodal tasks for four representative types of insurance: auto insurance, property insurance, health insurance, and agricultural insurance. We propose INS-MMBench, the first comprehensive LVLMs benchmark tailored for the insurance domain. INS-MMBench comprises a total of 2.2K thoroughly designed multiple-choice questions, covering 12 meta-tasks and 22 fundamental tasks. Furthermore, we evaluate multiple representative LVLMs, including closed-source models such as GPT-4o and open-source models like BLIP-2. This evaluation not only validates the effectiveness of our benchmark but also provides an in-depth performance analysis of current LVLMs on various multimodal tasks in the insurance domain. We hope that INS-MMBench will facilitate the further application of LVLMs in the insurance domain and inspire interdisciplinary development. Our dataset and evaluation code are available at https://github.com/FDU-INS/INS-MMBench.
翻译:大视觉语言模型(LVLMs)在图像识别和视觉推理等各类通用多模态应用中展现出卓越性能,在垂直专业领域也显示出巨大潜力。然而,在应用场景丰富、多模态数据充足的保险领域,LVLMs的应用潜力尚未得到有效探索。目前既缺乏对保险领域多模态任务的系统性梳理,也缺少专门用于评估LVLMs在保险领域能力的基准测试。这一空白阻碍了LVLMs在保险领域的发展。本文系统性地梳理并提炼了四类代表性保险(车险、财产险、健康险和农业险)的多模态任务,提出了首个面向保险领域量身定制的大视觉语言模型综合基准——INS-MMBench。该基准包含共计2.2K个精心设计的多选题,涵盖12项元任务和22项基础任务。此外,我们对多个代表性LVLMs进行了评估,包括GPT-4o等闭源模型和BLIP-2等开源模型。此次评估不仅验证了我们基准的有效性,还深入分析了当前LVLMs在保险领域各类多模态任务上的性能表现。我们希望INS-MMBench能够推动LVLMs在保险领域的进一步应用,并启发跨学科的发展。我们的数据集和评估代码已公开于 https://github.com/FDU-INS/INS-MMBench。