Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction

This paper addresses the problem of predicting hazards that drivers may encounter while driving a car. We formulate it as a task of anticipating impending accidents using a single input image captured by car dashcams. Unlike existing approaches to driving hazard prediction that rely on computational simulations or anomaly detection from videos, this study focuses on high-level inference from static images. The problem needs predicting and reasoning about future events based on uncertain observations, which falls under visual abductive reasoning. To enable research in this understudied area, a new dataset named the DHPR (Driving Hazard Prediction and Reasoning) dataset is created. The dataset consists of 15K dashcam images of street scenes, and each image is associated with a tuple containing car speed, a hypothesized hazard description, and visual entities present in the scene. These are annotated by human annotators, who identify risky scenes and provide descriptions of potential accidents that could occur a few seconds later. We present several baseline methods and evaluate their performance on our dataset, identifying remaining issues and discussing future directions. This study contributes to the field by introducing a novel problem formulation and dataset, enabling researchers to explore the potential of multi-modal AI for driving hazard prediction.

翻译：本文旨在解决预测驾驶员在驾驶汽车时可能遇到的危险问题。我们将其表述为利用车载摄像头捕获的单一输入图像来预测即将发生事故的任务。与现有依赖计算模拟或视频异常检测的驾驶危险预测方法不同，本研究侧重于从静态图像进行高级推理。该问题需要基于不确定的观察来预测和推理未来事件，属于视觉溯因推理范畴。为了推动这一尚未充分研究领域的探索，我们创建了一个名为DHPR（驾驶危险预测与推理）的新数据集。该数据集包含15,000张街道场景的行车记录仪图像，每张图像关联一个包含车速、假设危险描述和场景中视觉实体的元组。这些数据由人工标注员标注，他们识别风险场景并提供几秒后可能发生的潜在事故描述。我们提出了几种基线方法，并在数据集上评估了其性能，指出了现存问题并讨论了未来方向。本研究通过引入新颖的问题表述和数据集，为研究者探索多模态人工智能在驾驶危险预测中的潜力提供了支持，对该领域做出了贡献。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日