Autonomous Vehicle (AV) perception systems require more than simply seeing, via e.g., object detection or scene segmentation. They need a holistic understanding of what is happening within the scene for safe interaction with other road users. Few datasets exist for the purpose of developing and training algorithms to comprehend the actions of other road users. This paper presents ROAD-Waymo, an extensive dataset for the development and benchmarking of techniques for agent, action, location and event detection in road scenes, provided as a layer upon the (US) Waymo Open dataset. Considerably larger and more challenging than any existing dataset (and encompassing multiple cities), it comes with 198k annotated video frames, 54k agent tubes, 3.9M bounding boxes and a total of 12.4M labels. The integrity of the dataset has been confirmed and enhanced via a novel annotation pipeline designed for automatically identifying violations of requirements specifically designed for this dataset. As ROAD-Waymo is compatible with the original (UK) ROAD dataset, it provides the opportunity to tackle domain adaptation between real-world road scenarios in different countries within a novel benchmark: ROAD++.
翻译:自动驾驶汽车(AV)的感知系统不仅需要实现如目标检测或场景分割等“看见”功能,更需对场景内正在发生的事件形成整体理解,以安全地与其他道路使用者进行交互。目前,专门用于开发和训练算法以理解其他道路使用者行为的数据集十分有限。本文提出了ROAD-Waymo,这是一个用于道路场景中智能体、行为、位置及事件检测技术开发与评估的大规模数据集。该数据集作为(美国)Waymo Open数据集的一个附加层提供。与任何现有数据集相比,ROAD-Waymo规模更大、更具挑战性(涵盖多个城市),包含19.8万帧带标注的视频、5.4万个智能体轨迹管、390万个边界框以及总计1240万个标签。通过一个新颖的标注流程,我们确认并提升了数据集的完整性,该流程专为自动识别违反本数据集特定要求的情况而设计。由于ROAD-Waymo与原始的(英国)ROAD数据集兼容,它提供了一个新的基准测试ROAD++,为研究不同国家真实道路场景间的领域自适应问题创造了机会。