AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving

Shuo Xing,Hongyuan Hua,Xiangbo Gao,Shenzhe Zhu,Renjie Li,Kexin Tian,Xiaopeng Li,Heng Huang,Tianbao Yang,Zhangyang Wang,Yang Zhou,Huaxiu Yao,Zhengzhong Tu

from arxiv, 55 pages, 14 figures

Recent advancements in large vision language models (VLMs) tailored for autonomous driving (AD) have shown strong scene understanding and reasoning capabilities, making them undeniable candidates for end-to-end driving systems. However, limited work exists on studying the trustworthiness of DriveVLMs -- a critical factor that directly impacts public transportation safety. In this paper, we introduce AutoTrust, a comprehensive trustworthiness benchmark for large vision-language models in autonomous driving (DriveVLMs), considering diverse perspectives -- including trustfulness, safety, robustness, privacy, and fairness. We constructed the largest visual question-answering dataset for investigating trustworthiness issues in driving scenarios, comprising over 10k unique scenes and 18k queries. We evaluated six publicly available VLMs, spanning from generalist to specialist, from open-source to commercial models. Our exhaustive evaluations have unveiled previously undiscovered vulnerabilities of DriveVLMs to trustworthiness threats. Specifically, we found that the general VLMs like LLaVA-v1.6 and GPT-4o-mini surprisingly outperform specialized models fine-tuned for driving in terms of overall trustworthiness. DriveVLMs like DriveLM-Agent are particularly vulnerable to disclosing sensitive information. Additionally, both generalist and specialist VLMs remain susceptible to adversarial attacks and struggle to ensure unbiased decision-making across diverse environments and populations. Our findings call for immediate and decisive action to address the trustworthiness of DriveVLMs -- an issue of critical importance to public safety and the welfare of all citizens relying on autonomous transportation systems. Our benchmark is publicly available at \url{https://github.com/taco-group/AutoTrust}, and the leaderboard is released at \url{https://taco-group.github.io/AutoTrust/}.

翻译：专为自动驾驶（AD）定制的大型视觉语言模型（VLMs）的最新进展，已展现出强大的场景理解与推理能力，使其成为端到端驾驶系统不可否认的候选方案。然而，目前针对DriveVLMs（自动驾驶视觉语言模型）可信度的研究仍十分有限——这是一个直接影响公共交通安全的关键因素。本文提出了AutoTrust，一个用于评估自动驾驶领域大型视觉语言模型（DriveVLMs）可信度的综合性基准，其考量了多个维度——包括可信性、安全性、鲁棒性、隐私性和公平性。我们构建了目前最大的、用于探究驾驶场景中可信度问题的视觉问答数据集，包含超过1万个独特场景和1.8万个查询。我们评估了六个公开可用的VLMs，涵盖从通用模型到专用模型，从开源模型到商业模型。我们详尽的评估揭示了DriveVLMs此前未被发现的可信度威胁漏洞。具体而言，我们发现，在整体可信度方面，像LLaVA-v1.6和GPT-4o-mini这样的通用VLM，其表现意外地优于为驾驶任务微调的专用模型。像DriveLM-Agent这样的DriveVLMs尤其容易泄露敏感信息。此外，无论是通用还是专用VLMs，都仍然容易受到对抗性攻击的影响，并且难以确保在不同环境和人群中的决策无偏性。我们的研究结果呼吁立即采取果断行动，以解决DriveVLMs的可信度问题——这对于公共安全以及所有依赖自动驾驶交通系统的公民福祉至关重要。我们的基准测试已在 \url{https://github.com/taco-group/AutoTrust} 公开提供，排行榜发布于 \url{https://taco-group.github.io/AutoTrust/}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日