Accurately and promptly predicting accidents among surrounding traffic agents from camera footage is crucial for the safety of autonomous vehicles (AVs). This task presents substantial challenges stemming from the unpredictable nature of traffic accidents, their long-tail distribution, the intricacies of traffic scene dynamics, and the inherently constrained field of vision of onboard cameras. To address these challenges, this study introduces a novel accident anticipation framework for AVs, termed CRASH. It seamlessly integrates five components: object detector, feature extractor, object-aware module, context-aware module, and multi-layer fusion. Specifically, we develop the object-aware module to prioritize high-risk objects in complex and ambiguous environments by calculating the spatial-temporal relationships between traffic agents. In parallel, the context-aware is also devised to extend global visual information from the temporal to the frequency domain using the Fast Fourier Transform (FFT) and capture fine-grained visual features of potential objects and broader context cues within traffic scenes. To capture a wider range of visual cues, we further propose a multi-layer fusion that dynamically computes the temporal dependencies between different scenes and iteratively updates the correlations between different visual features for accurate and timely accident prediction. Evaluated on real-world datasets--Dashcam Accident Dataset (DAD), Car Crash Dataset (CCD), and AnAn Accident Detection (A3D) datasets--our model surpasses existing top baselines in critical evaluation metrics like Average Precision (AP) and mean Time-To-Accident (mTTA). Importantly, its robustness and adaptability are particularly evident in challenging driving scenarios with missing or limited training data, demonstrating significant potential for application in real-world autonomous driving systems.
翻译:从摄像头画面中准确及时地预测周围交通参与者的碰撞事故,对于自动驾驶车辆(AVs)的安全至关重要。该任务面临巨大挑战,源于交通事故的不可预测性、其长尾分布、交通场景动态的复杂性以及车载摄像头固有的视野限制。为应对这些挑战,本研究提出了一种新颖的自动驾驶车辆事故预测框架,命名为CRASH。它无缝集成了五个组件:目标检测器、特征提取器、目标感知模块、上下文感知模块和多层融合模块。具体而言,我们开发了目标感知模块,通过计算交通参与者之间的时空关系,在复杂和模糊的环境中优先处理高风险目标。同时,我们还设计了上下文感知模块,利用快速傅里叶变换(FFT)将全局视觉信息从时域扩展到频域,并捕捉交通场景中潜在目标的细粒度视觉特征以及更广泛的上下文线索。为捕捉更广泛的视觉线索,我们进一步提出了多层融合模块,该模块动态计算不同场景之间的时间依赖性,并迭代更新不同视觉特征之间的相关性,以实现准确及时的事故预测。在真实世界数据集——行车记录仪事故数据集(DAD)、车辆碰撞数据集(CCD)和AnAn事故检测(A3D)数据集上进行评估后,我们的模型在平均精度(AP)和平均事故前时间(mTTA)等关键评估指标上超越了现有的顶级基线方法。重要的是,其鲁棒性和适应性在训练数据缺失或有限的挑战性驾驶场景中尤为明显,展现了在实际自动驾驶系统中应用的巨大潜力。