Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

Large Vision-Language Models (LVLMs) have received widespread attention in advancing the interpretable self-driving. Existing evaluations of LVLMs primarily focus on the multi-faceted capabilities in natural circumstances, lacking automated and quantifiable assessment for self-driving, let alone the severe road corner cases. In this paper, we propose CODA-LM, the very first benchmark for the automatic evaluation of LVLMs for self-driving corner cases. We adopt a hierarchical data structure to prompt powerful LVLMs to analyze complex driving scenes and generate high-quality pre-annotation for human annotators, and for LVLM evaluation, we show that using the text-only large language models (LLMs) as judges reveals even better alignment with human preferences than the LVLM judges. Moreover, with CODA-LM, we build CODA-VLM, a new driving LVLM surpassing all the open-sourced counterparts on CODA-LM. Our CODA-VLM performs comparably with GPT-4V, even surpassing GPT-4V by +21.42% on the regional perception task. We hope CODA-LM can become the catalyst to promote interpretable self-driving empowered by LVLMs.

翻译：大型视觉语言模型（LVLMs）在推动可解释自动驾驶方面受到广泛关注。现有对LVLMs的评估主要集中于其在自然场景下的多方位能力，缺乏针对自动驾驶的自动化、可量化评估，更遑论对严峻道路极端场景的评测。本文提出CODA-LM，首个面向自动驾驶极端场景的LVLM自动化评估基准。我们采用分层数据结构提示强大的LVLMs分析复杂驾驶场景，为人工标注者生成高质量预标注；对于LVLM评估，我们发现使用纯文本大语言模型（LLMs）作为评判者比LVLM评判者能实现与人类偏好更优的对齐。此外，基于CODA-LM，我们构建了CODA-VLM，这是一个新的驾驶专用LVLM，其在CODA-LM上的表现超越了所有开源同类模型。我们的CODA-VLM与GPT-4V性能相当，甚至在区域感知任务上以+21.42%的优势超越GPT-4V。我们希望CODA-LM能够成为推动LVLMs赋能可解释自动驾驶发展的催化剂。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日