ROADWork数据集：学习识别、观察、分析并穿越施工区域 (ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones)

Anurag Ghosh,Shen Zheng,Robert Tamburo,Khiem Vuong,Juan Alvarez-Padilla,Hailiang Zhu,Michael Cardei,Nicholas Dunn,Christoph Mertz,Srinivasa G. Narasimhan

from arxiv, ICCV 2025 Accepted Paper

Perceiving and autonomously navigating through work zones is a challenging and underexplored problem. Open datasets for this long-tailed scenario are scarce. We propose the ROADWork dataset to learn to recognize, observe, analyze, and drive through work zones. State-of-the-art foundation models fail when applied to work zones. Fine-tuning models on our dataset significantly improves perception and navigation in work zones. With ROADWork dataset, we discover new work zone images with higher precision (+32.5%) at a much higher rate (12.8$\times$) around the world. Open-vocabulary methods fail too, whereas fine-tuned detectors improve performance (+32.2 AP). Vision-Language Models (VLMs) struggle to describe work zones, but fine-tuning substantially improves performance (+36.7 SPICE). Beyond fine-tuning, we show the value of simple techniques. Video label propagation provides additional gains (+2.6 AP) for instance segmentation. While reading work zone signs, composing a detector and text spotter via crop-scaling improves performance +14.2% 1-NED). Composing work zone detections to provide context further reduces hallucinations (+3.9 SPICE) in VLMs. We predict navigational goals and compute drivable paths from work zone videos. Incorporating road work semantics ensures 53.6% goals have angular error (AE) < 0.5 (+9.9 %) and 75.3% pathways have AE < 0.5 (+8.1 %).

翻译：感知并自主导航穿越施工区域是一个具有挑战性且尚未被充分探索的问题。针对此类长尾场景的开放数据集十分稀缺。我们提出了ROADWork数据集，旨在学习识别、观察、分析并穿越施工区域。现有的基础模型在应用于施工区域时表现不佳。在我们的数据集上进行微调，显著提升了模型在施工区域内的感知与导航能力。借助ROADWork数据集，我们能够以更高的精度（+32.5%）和更快的速度（12.8倍）在全球范围内发现新的施工区域图像。开放词汇方法同样失效，而经过微调的检测器则提升了性能（+32.2 AP）。视觉-语言模型（VLMs）难以准确描述施工区域，但微调后性能大幅改善（+36.7 SPICE）。除了微调，我们还展示了简单技术的价值。视频标签传播为实例分割带来了额外增益（+2.6 AP）。在识别施工区域标志时，通过裁剪缩放组合检测器和文本定位器，性能提升了+14.2%（1-NED）。组合施工区域检测结果以提供上下文信息，进一步减少了视觉-语言模型中的幻觉（+3.9 SPICE）。我们根据施工区域视频预测导航目标并计算可行驶路径。结合道路施工语义信息，确保了53.6%的目标角度误差（AE）< 0.5（提升9.9%），以及75.3%的路径角度误差（AE）< 0.5（提升8.1%）。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日