Self-driving simulations typically rely on data collected in a small number of cities or on hand-authored synthetic scenarios. Dashcam videos cover a far broader range of locations and situations, including rare or long-tailed scenarios. They are considered less usable for simulation because it is difficult to recover accurate 4D scenes from monocular in-the-wild videos. Work zones are one such class of long-tailed situations that dashcams capture. We present Dash2Sim, a framework that turns in-the-wild monocular dashcam videos into metric, geo-referenced 4D driving logs compatible with existing simulators, and verifies eachone against an independently maintained map without annotations. We apply Dash2Sim to a large video corpus to create the ROADWork4D benchmark dataset, which spans 4,244 scenes with 2.7M 3D objects across 17 cities. On a verified subset ROADWork4D-CL (2,201 scenes), we study privileged closed-loop planners and find that work zone scenarios are difficult: while rule-based and hybrid planners generalize better than learning-based ones, all fall short, failing to make the lane changes that temporary work zone channels require. Beyond planning, dense depth recovered by Dash2Sim improves novel-view synthesis quality by up to 19% on perceptual metrics, suggesting its potential to provide rich conditioning for closed-loop sensor simulation from monocular videos.
翻译:自动驾驶仿真通常依赖少数城市采集的数据或手工编写的合成场景。行车记录仪视频涵盖更广泛的地点和情境,包括罕见或长尾场景。由于难以从单目野外视频中恢复精确的四维场景,这些视频通常被认为不适合用于仿真。工作区正是行车记录仪捕捉到的一类长尾情境。我们提出Dash2Sim框架,该框架将野外单目行车记录仪视频转化为与现有仿真器兼容的度量级地理参考四维驾驶日志,并依据独立维护的地图自动验证每条日志。我们应用Dash2Sim构建大规模视频语料库ROADWork4D基准数据集,涵盖17个城市的4244个场景及270万个三维目标。在经验证的子集ROADWork4D-CL(2201个场景)中,我们研究了特权闭环规划器,发现工作区场景具有挑战性:尽管基于规则和混合规划器的泛化能力优于基于学习的规划器,但所有方法均存在不足,无法完成临时工作区通道所需的变道操作。除规划外,Dash2Sim恢复的密集深度在感知指标上可将新视角合成质量提升高达19%,表明其具有为基于单目视频的闭环传感器仿真提供丰富条件信息的潜力。