End-to-end autonomous driving models trained solely with imitation learning (IL) often suffer from poor generalization. In contrast, reinforcement learning (RL) promotes exploration through reward maximization but faces challenges such as sample inefficiency and unstable convergence. A natural solution is to combine IL and RL. Moving beyond the conventional two-stage paradigm (IL pretraining followed by RL fine-tuning), we propose CoIRL-AD, a competitive dual-policy framework that enables IL and RL agents to interact during training. CoIRL-AD introduces a competition-based mechanism that facilitates knowledge exchange while preventing gradient conflicts. Experiments on the nuScenes dataset show an 18% reduction in collision rate compared to baselines, along with stronger generalization and improved performance on long-tail scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.
翻译:仅通过模仿学习(IL)训练的端到端自动驾驶模型通常泛化能力较差。相比之下,强化学习(RL)通过奖励最大化促进探索,但面临样本效率低和收敛不稳定等挑战。一个自然的解决方案是将IL与RL相结合。我们超越了传统的两阶段范式(先进行IL预训练,再进行RL微调),提出了CoIRL-AD,这是一个竞争性双策略框架,允许IL和RL智能体在训练过程中进行交互。CoIRL-AD引入了一种基于竞争的机制,在促进知识交换的同时防止梯度冲突。在nuScenes数据集上的实验表明,与基线方法相比,碰撞率降低了18%,同时在长尾场景中表现出更强的泛化能力和更好的性能。代码发布于:https://github.com/SEU-zxj/CoIRL-AD。