In the past several years, road anomaly segmentation is actively explored in the academia and drawing growing attention in the industry. The rationale behind is straightforward: if the autonomous car can brake before hitting an anomalous object, safety is promoted. However, this rationale naturally calls for a temporally informed setting while existing methods and benchmarks are designed in an unrealistic frame-wise manner. To bridge this gap, we contribute the first video anomaly segmentation dataset for autonomous driving. Since placing various anomalous objects on busy roads and annotating them in every frame are dangerous and expensive, we resort to synthetic data. To improve the relevance of this synthetic dataset to real-world applications, we train a generative adversarial network conditioned on rendering G-buffers for photorealism enhancement. Our dataset consists of 120,000 high-resolution frames at a 60 FPS framerate, as recorded in 7 different towns. As an initial benchmarking, we provide baselines using latest supervised and unsupervised road anomaly segmentation methods. Apart from conventional ones, we focus on two new metrics: temporal consistency and latencyaware streaming accuracy. We believe the latter is valuable as it measures whether an anomaly segmentation algorithm can truly prevent a car from crashing in a temporally informed setting.
翻译:过去数年间,道路异常分割在学术界被积极探讨,并在工业界获得日益增长的关注。其背后的逻辑清晰明了:若自动驾驶汽车能在撞击异常物体前制动,安全性将得到提升。然而,这一逻辑自然要求具备时间感知的设置,而现有方法与基准却以不切实际的逐帧方式设计。为弥补这一差距,我们贡献了首个面向自动驾驶的视频异常分割数据集。鉴于在繁忙道路上放置各类异常物体并在每一帧中进行标注既危险又昂贵,我们采用合成数据。为提升该合成数据集与实际应用的相关性,我们训练了一个基于渲染G-buffer条件化的生成对抗网络,以增强其逼真度。我们的数据集包含12万张高分辨率帧,帧率为60 FPS,记录于7个不同城镇。作为初步基准测试,我们利用最新的有监督与无监督道路异常分割方法提供了基线结果。除传统指标外,我们聚焦于两项新指标:时间一致性与延迟感知流式准确率。我们认为后者具有重要价值,因为它衡量了在时间感知设置下,异常分割算法能否真正防止车辆碰撞。