Autonomous driving techniques have been flourishing in recent years while thirsting for huge amounts of high-quality data. However, it is difficult for real-world datasets to keep up with the pace of changing requirements due to their expensive and time-consuming experimental and labeling costs. Therefore, more and more researchers are turning to synthetic datasets to easily generate rich and changeable data as an effective complement to the real world and to improve the performance of algorithms. In this paper, we summarize the evolution of synthetic dataset generation methods and review the work to date in synthetic datasets related to single and multi-task categories for to autonomous driving study. We also discuss the role that synthetic dataset plays the evaluation, gap test, and positive effect in autonomous driving related algorithm testing, especially on trustworthiness and safety aspects. Finally, we discuss general trends and possible development directions. To the best of our knowledge, this is the first survey focusing on the application of synthetic datasets in autonomous driving. This survey also raises awareness of the problems of real-world deployment of autonomous driving technology and provides researchers with a possible solution.
翻译:近年来,自动驾驶技术蓬勃发展,对海量高质量数据的需求日益迫切。然而,真实世界数据集因实验和标注成本高昂且耗时,难以跟上需求变化的步伐。因此,越来越多的研究者转向合成数据集,以便捷地生成丰富多变的数据,作为对真实世界的有效补充,并提升算法性能。本文总结了合成数据集生成方法的演进过程,并回顾了迄今为止与自动驾驶研究中单任务及多任务类别相关的合成数据集工作。我们还探讨了合成数据集在自动驾驶相关算法测试(特别是在可信性与安全性方面)的评估、差距测试及积极作用中所扮演的角色。最后,我们讨论了总体趋势与可能的发展方向。据我们所知,这是首篇聚焦于合成数据集在自动驾驶中应用的综述。本综述也旨在提升对自动驾驶技术实际部署问题的认识,并为研究者提供一种可能的解决方案。