CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

Peng Xia,Ze Chen,Juanxi Tian,Yangrui Gong,Ruibo Hou,Yue Xu,Zhenbang Wu,Zhiyuan Fan,Yiyang Zhou,Kangyu Zhu,Wenhao Zheng,Zhaoyang Wang,Xiao Wang,Xuchao Zhang,Chetan Bansal,Marc Niethammer,Junzhou Huang,Hongtu Zhu,Yun Li,Jimeng Sun,Zongyuan Ge,Gang Li,James Zou,Huaxiu Yao

from arxiv, NeurIPS 2024 Datasets and Benchmarks Track

Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in https://cares-ai.github.io/.

翻译：人工智能已显著影响医疗应用，特别是随着医疗大型视觉语言模型（Med-LVLMs）的出现，为自动化和个性化医疗的未来带来了乐观前景。然而，Med-LVLMs的可信度仍未得到验证，这为未来模型部署带来了重大风险。本文提出CARES基准，旨在全面评估Med-LVLMs在医疗领域的可信度。我们从可信性、公平性、安全性、隐私性和鲁棒性五个维度评估Med-LVLMs的可信度。CARES包含约4.1万个封闭式和开放式问答对，涵盖16种医学影像模态和27个解剖区域。我们的分析表明，这些模型在可信度方面持续存在问题，常出现事实性错误，且未能保持跨人口群体的公平性。此外，它们易受攻击并表现出隐私意识缺失。我们已在https://cares-ai.github.io/公开基准数据和代码。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日