End-to-end autonomous driving models trained with imitation learning (IL) often generalize poorly, particularly in long-tail scenarios where expert demonstrations are sparse. Reinforcement learning (RL) can provide complementary task-level supervision, but applying RL to real-world autonomous driving is challenging in offline settings without interactive simulators, where datasets are dominated by expert actions and provide limited behavioral diversity. We propose CoIRL-AD, a competitive dual-policy framework that integrates IL and RL under a unified offline training regime. CoIRL-AD decouples imitation and reward optimization into separate actors to alleviate objective conflicts, uses imagined future rollouts for long-horizon reward estimation, and introduces a competition mechanism that selectively transfers beneficial behaviors while keeping RL anchored to expert-like driving. Experiments on the nuScenes benchmark show that CoIRL-AD consistently improves robustness over strong IL-based baselines, with especially large gains in cross-city generalization and long-tail scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.
翻译:基于模仿学习的端到端自动驾驶模型通常泛化能力较差,尤其在专家演示稀疏的长尾场景中表现不佳。强化学习能提供互补的任务级监督,但在无交互模拟器的离线环境下,将其应用于真实世界自动驾驶面临挑战——此时数据集以专家动作为主,行为多样性有限。本文提出CoIRL-AD,一种在统一离线训练框架下整合模仿学习与强化学习的竞争型双策略框架。CoIRL-AD将模仿与奖励优化解耦为独立智能体以缓解目标冲突,利用想象未来推演进行长时域奖励估计,并引入竞争机制选择性迁移有益行为,同时使强化学习锚定在类似专家的驾驶模式上。在nuScenes基准上的实验表明,CoIRL-AD在强模仿学习基线基础上持续提升鲁棒性,尤其在跨城市泛化和长尾场景中取得显著提升。代码已开源:https://github.com/SEU-zxj/CoIRL-AD。