Driven by the large foundation models, the development of artificial intelligence has witnessed tremendous progress lately, leading to a surge of general interest from the public. In this study, we aim to assess the performance of OpenAI's newest model, GPT-4V(ision), specifically in the realm of multimodal medical diagnosis. Our evaluation encompasses 17 human body systems, including Central Nervous System, Head and Neck, Cardiac, Chest, Hematology, Hepatobiliary, Gastrointestinal, Urogenital, Gynecology, Obstetrics, Breast, Musculoskeletal, Spine, Vascular, Oncology, Trauma, Pediatrics, with images taken from 8 modalities used in daily clinic routine, e.g., X-ray, Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), Digital Subtraction Angiography (DSA), Mammography, Ultrasound, and Pathology. We probe the GPT-4V's ability on multiple clinical tasks with or without patent history provided, including imaging modality and anatomy recognition, disease diagnosis, report generation, disease localisation. Our observation shows that, while GPT-4V demonstrates proficiency in distinguishing between medical image modalities and anatomy, it faces significant challenges in disease diagnosis and generating comprehensive reports. These findings underscore that while large multimodal models have made significant advancements in computer vision and natural language processing, it remains far from being used to effectively support real-world medical applications and clinical decision-making. All images used in this report can be found in https://github.com/chaoyi-wu/GPT-4V_Medical_Evaluation.
翻译:受大型基础模型驱动,人工智能领域近期取得了显著进展,引发了公众的广泛关注。本研究旨在评估OpenAI最新模型GPT-4V(ision)在多模态医学诊断任务中的表现。我们的评估涵盖17个人体系统,包括中枢神经系统、头颈部、心脏、胸部、血液系统、肝胆系统、胃肠道、泌尿生殖系统、妇科、产科、乳腺、肌肉骨骼系统、脊柱、血管系统、肿瘤学、创伤及儿科,图像采集自日常临床诊疗中使用的8种影像模态,如X光、计算机断层扫描(CT)、磁共振成像(MRI)、正电子发射断层扫描(PET)、数字减影血管造影(DSA)、乳腺X线摄影、超声及病理图像。我们探索了GPT-4V在多项临床任务中的能力,包括有无病史提供两种情况下的影像模态与解剖结构识别、疾病诊断、报告生成及病灶定位。观察结果表明,尽管GPT-4V在区分医学影像模态和解剖结构方面表现出色,但在疾病诊断和生成综合报告方面仍面临重大挑战。这些发现强调,尽管大型多模态模型在计算机视觉和自然语言处理领域取得了长足进步,但距有效支持真实世界医疗应用和临床决策仍有显著差距。本报告使用的所有图像均可于 https://github.com/chaoyi-wu/GPT-4V_Medical_Evaluation 获取。