This paper addresses the problem of predicting hazards that drivers may encounter while driving a car. We formulate it as a task of anticipating impending accidents using a single input image captured by car dashcams. Unlike existing approaches to driving hazard prediction that rely on computational simulations or anomaly detection from videos, this study focuses on high-level inference from static images. The problem needs predicting and reasoning about future events based on uncertain observations, which falls under visual abductive reasoning. To enable research in this understudied area, a new dataset named the DHPR (Driving Hazard Prediction and Reasoning) dataset is created. The dataset consists of 15K dashcam images of street scenes, and each image is associated with a tuple containing car speed, a hypothesized hazard description, and visual entities present in the scene. These are annotated by human annotators, who identify risky scenes and provide descriptions of potential accidents that could occur a few seconds later. We present several baseline methods and evaluate their performance on our dataset, identifying remaining issues and discussing future directions. This study contributes to the field by introducing a novel problem formulation and dataset, enabling researchers to explore the potential of multi-modal AI for driving hazard prediction.
翻译:本文旨在解决预测驾驶员在驾驶汽车时可能遇到的危险问题。我们将其表述为利用车载摄像头捕获的单一输入图像来预测即将发生事故的任务。与现有依赖计算模拟或视频异常检测的驾驶危险预测方法不同,本研究侧重于从静态图像进行高级推理。该问题需要基于不确定的观察来预测和推理未来事件,属于视觉溯因推理范畴。为了推动这一尚未充分研究领域的探索,我们创建了一个名为DHPR(驾驶危险预测与推理)的新数据集。该数据集包含15,000张街道场景的行车记录仪图像,每张图像关联一个包含车速、假设危险描述和场景中视觉实体的元组。这些数据由人工标注员标注,他们识别风险场景并提供几秒后可能发生的潜在事故描述。我们提出了几种基线方法,并在数据集上评估了其性能,指出了现存问题并讨论了未来方向。本研究通过引入新颖的问题表述和数据集,为研究者探索多模态人工智能在驾驶危险预测中的潜力提供了支持,对该领域做出了贡献。