Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models

Generating varied scenarios through simulation is crucial for training and evaluating safety-critical systems, such as autonomous vehicles. Yet, the task of modeling the trajectories of other vehicles to simulate diverse and meaningful close interactions remains prohibitively costly. Adopting language descriptions to generate driving behaviors emerges as a promising strategy, offering a scalable and intuitive method for human operators to simulate a wide range of driving interactions. However, the scarcity of large-scale annotated language-trajectory data makes this approach challenging. To address this gap, we propose Text-to-Drive (T2D) to synthesize diverse driving behaviors via Large Language Models (LLMs). We introduce a knowledge-driven approach that operates in two stages. In the first stage, we employ the embedded knowledge of LLMs to generate diverse language descriptions of driving behaviors for a scene. Then, we leverage LLM's reasoning capabilities to synthesize these behaviors in simulation. At its core, T2D employs an LLM to construct a state chart that maps low-level states to high-level abstractions. This strategy aids in downstream tasks such as summarizing low-level observations, assessing policy alignment with behavior description, and shaping the auxiliary reward, all without needing human supervision. With our knowledge-driven approach, we demonstrate that T2D generates more diverse trajectories compared to other baselines and offers a natural language interface that allows for interactive incorporation of human preference. Please check our website for more examples: https://text-to-drive.github.io/

翻译：通过仿真生成多样化场景对于自动驾驶车辆等安全关键系统的训练与评估至关重要。然而，对周围车辆轨迹进行建模以模拟多样且具有实际意义的近距离交互，其成本仍然极高。采用语言描述生成驾驶行为成为一种前景广阔的策略，为人类操作者提供了一种可扩展且直观的方法，用以模拟广泛的驾驶交互场景。然而，大规模标注的语言-轨迹数据稀缺使得这一方法面临挑战。为弥补这一缺口，我们提出文本到驾驶（Text-to-Drive, T2D）方法，通过大语言模型（Large Language Models, LLMs）合成多样化驾驶行为。我们引入一种知识驱动的两阶段方法：第一阶段，利用LLMs的内嵌知识为特定场景生成多样化的驾驶行为语言描述；第二阶段，借助LLM的推理能力在仿真中合成这些行为。T2D的核心在于使用LLM构建状态图，将底层状态映射为高层抽象表示。这一策略有助于下游任务，例如总结底层观测、评估策略与行为描述的一致性以及构建辅助奖励函数，整个过程无需人工监督。通过我们的知识驱动方法，我们证明T2D相比其他基线方法能生成更多样化的轨迹，并提供自然语言接口，允许交互式融入人类偏好。更多示例请访问我们的网站：https://text-to-drive.github.io/