Towards Safer and Understandable Driver Intention Prediction

Autonomous driving (AD) systems are becoming increasingly capable of handling complex tasks, mainly due to recent advances in deep learning and AI. As interactions between autonomous systems and humans increase, the interpretability of decision-making processes in driving systems becomes increasingly crucial for ensuring safe driving operations. Successful human-machine interaction requires understanding the underlying representations of the environment and the driving task, which remains a significant challenge in deep learning-based systems. To address this, we introduce the task of interpretability in maneuver prediction before they occur for driver safety, i.e., driver intent prediction (DIP), which plays a critical role in AD systems. To foster research in interpretable DIP, we curate the eXplainable Driving Action Anticipation Dataset (DAAD-X), a new multimodal, ego-centric video dataset to provide hierarchical, high-level textual explanations as causal reasoning for the driver's decisions. These explanations are derived from both the driver's eye-gaze and the ego-vehicle's perspective. Next, we propose Video Concept Bottleneck Model (VCBM), a framework that generates spatio-temporally coherent explanations inherently, without relying on post-hoc techniques. Finally, through extensive evaluations of the proposed VCBM on the DAAD-X dataset, we demonstrate that transformer-based models exhibit greater interpretability than conventional CNN-based models. Additionally, we introduce a multilabel t-SNE visualization technique to illustrate the disentanglement and causal correlation among multiple explanations. Our data, code and models are available at: https://mukil07.github.io/VCBM.github.io/

翻译：自动驾驶系统正日益能够处理复杂任务，这主要得益于深度学习与人工智能领域的最新进展。随着自主系统与人类交互的日益频繁，驾驶系统决策过程的可解释性对于确保安全驾驶操作变得愈发关键。成功的人机交互需要理解环境与驾驶任务的潜在表征，而这在基于深度学习的系统中仍是一个重大挑战。为此，我们提出了在机动行为发生前进行可解释性预测以保障驾驶员安全的任务——即驾驶员意图预测，该任务在自动驾驶系统中具有关键作用。为促进可解释性驾驶员意图预测的研究，我们构建了可解释驾驶行为预期数据集——一种新型多模态、以自我为中心的视频数据集，旨在为驾驶员的决策提供分层的高层级文本解释作为因果推理。这些解释同时来源于驾驶员的视线注视与自我车辆的感知视角。进一步，我们提出了视频概念瓶颈模型——一种能够本质上生成时空连贯解释的框架，无需依赖事后解释技术。最后，通过对所提视频概念瓶颈模型在可解释驾驶行为预期数据集上的广泛评估，我们证明了基于Transformer的模型比传统的基于卷积神经网络的模型展现出更强的可解释性。此外，我们引入了一种多标签t-SNE可视化技术，用以阐明多重解释之间的解耦关系与因果关联。我们的数据、代码与模型已公开于：https://mukil07.github.io/VCBM.github.io/

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日