Motion planning is a crucial component in autonomous driving. State-of-the-art motion planners are trained on meticulously curated datasets, which are not only expensive to annotate but also insufficient in capturing rarely seen critical scenarios. Failing to account for such scenarios poses a significant risk to motion planners and may lead to incidents during testing. An intuitive solution is to manually compose such scenarios by programming and executing a simulator (e.g., CARLA). However, this approach incurs substantial human costs. Motivated by this, we propose an inexpensive method for generating diverse critical traffic scenarios to train more robust motion planners. First, we represent traffic scenarios as scripts, which are then used by the simulator to generate traffic scenarios. Next, we develop a method that accepts user-specified text descriptions, which a Large Language Model (LLM) translates into scripts using in-context learning. The output scripts are sent to the simulator that produces the corresponding traffic scenarios. As our method can generate abundant safety-critical traffic scenarios, we use them as synthetic training data for motion planners. To demonstrate the value of generated scenarios, we train existing motion planners on our synthetic data, real-world datasets, and a combination of both. Our experiments show that motion planners trained with our data significantly outperform those trained solely on real-world data, showing the usefulness of our synthetic data and the effectiveness of our data generation method. Our source code is available at https://ezharjan.github.io/AutoSceneGen.
翻译:运动规划是自动驾驶中的关键组成部分。当前最先进的运动规划器基于精心筛选的数据集进行训练,这些数据集不仅标注成本高昂,且难以涵盖罕见的关键场景。若未能考虑此类场景,将对运动规划器构成重大风险,并可能在测试中导致事故。一种直观解决方案是通过编程运行模拟器(如CARLA)手动构建此类场景,但这种方法需要耗费大量人力。受此启发,我们提出一种低成本方法,用于生成多样化的关键交通场景以训练更具鲁棒性的运动规划器。首先,我们将交通场景表示为脚本,模拟器随后利用这些脚本生成交通场景。接着,我们开发了一种方法,该方法接收用户指定的文本描述,并通过大型语言模型(LLM)运用上下文学习技术将描述转化为脚本。输出的脚本被发送至模拟器以生成对应的交通场景。由于本方法能生成大量安全关键型交通场景,我们将其作为运动规划器的合成训练数据。为验证生成场景的价值,我们在合成数据、真实数据集以及两者混合数据上分别训练现有运动规划器。实验表明,使用本方法数据训练的运动规划器性能显著优于仅使用真实数据训练的规划器,这证明了合成数据的实用价值及数据生成方法的有效性。源代码发布于https://ezharjan.github.io/AutoSceneGen。