Effective autonomous driving hinges on robust reasoning across perception, prediction, planning, and behavior. However, conventional end-to-end models fail to generalize in complex scenarios due to the lack of structured reasoning. While recent vision-language models (VLMs) have been applied to driving tasks, they typically rely on isolated modules and static supervision, limiting their ability to support multi-stage decision-making. We present AutoDriveRL, a unified training framework that formulates autonomous driving as a structured reasoning process over four core tasks. Each task is independently modeled as a vision-language QA problem and optimized using task-specific reward models, enabling fine-grained reinforcement signals at different reasoning stages. Within this framework, we train DriveRX, a cross-task reasoning VLM designed for multi-stage decision-making. DriveRX achieves strong performance on the public benchmark, outperforming GPT-4o in behavior reasoning and demonstrating robustness under complex or corrupted driving conditions. DriveRX serves as a high-level semantic reasoning backbone, producing structured stage-wise reasoning chains that enhance decision consistency. These outputs also provide high-quality supervisory signals for annotation and downstream planning/control models. We release the AutoDriveRL framework and DriveRX to support future research.
翻译:有效的自动驾驶依赖于感知、预测、规划与行为之间的鲁棒推理。然而,传统的端到端模型由于缺乏结构化推理能力,难以在复杂场景中实现泛化。尽管近期视觉语言模型(VLM)已被应用于驾驶任务,但它们通常依赖孤立模块与静态监督,限制了其支持多阶段决策的能力。本文提出 AutoDriveRL,一个统一的训练框架,将自动驾驶建模为跨越四个核心任务的结构化推理过程。每个任务被独立建模为视觉语言问答问题,并利用任务特定的奖励模型进行优化,从而在不同推理阶段实现细粒度的强化学习信号。在此框架内,我们训练了 DriveRX——一个专为多阶段决策设计的跨任务推理 VLM。DriveRX 在公开基准测试中取得了优异性能,在行为推理任务上超越了 GPT-4o,并在复杂或受损的驾驶条件下展现出鲁棒性。DriveRX 可作为高层语义推理主干,生成结构化的分阶段推理链以增强决策一致性。其输出同时为标注任务及下游规划/控制模型提供了高质量的监督信号。我们公开 AutoDriveRL 框架与 DriveRX 模型以支持未来研究。