A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality, and a Future Outlook

Autonomous driving has rapidly developed and shown promising performance due to recent advances in hardware and deep learning techniques. High-quality datasets are fundamental for developing reliable autonomous driving algorithms. Previous dataset surveys either focused on a limited number or lacked detailed investigation of dataset characteristics. To this end, we present an exhaustive study of 265 autonomous driving datasets from multiple perspectives, including sensor modalities, data size, tasks, and contextual conditions. We introduce a novel metric to evaluate the impact of datasets, which can also be a guide for creating new datasets. Besides, we analyze the annotation processes, existing labeling tools, and the annotation quality of datasets, showing the importance of establishing a standard annotation pipeline. On the other hand, we thoroughly analyze the impact of geographical and adversarial environmental conditions on the performance of autonomous driving systems. Moreover, we exhibit the data distribution of several vital datasets and discuss their pros and cons accordingly. Finally, we discuss the current challenges and the development trend of the future autonomous driving datasets.

翻译：自动驾驶技术因硬件和深度学习技术的进步而迅速发展，展现了令人瞩目的性能。高质量数据集是开发可靠自动驾驶算法的基础。现有的数据集综述要么仅关注有限数量的数据集，要么缺乏对数据集特性的深入探究。为此，我们从传感器模态、数据规模、任务类型及环境条件等多个维度，对265个自动驾驶数据集进行了详尽研究。我们提出了一种新颖的指标来评估数据集的影响力，该指标也可为创建新数据集提供指导。此外，我们分析了标注流程、现有标注工具及数据集的标注质量，揭示了建立标准化标注流水线的重要性。另一方面，我们深入分析了地理环境和对抗性环境条件对自动驾驶系统性能的影响。同时，我们展示了若干关键数据集的数据分布，并据此讨论了各自的优缺点。最后，我们探讨了当前面临的挑战以及未来自动驾驶数据集的发展趋势。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日