Autonomous buses run on fixed routes but must operate in open, dynamic urban environments. Disengagement events on these routes are often geographically concentrated and typically arise from planner failures in highly interactive regions. Such policy-level failures are difficult to correct using conventional imitation learning, which easily overfits to sparse disengagement data. To address this issue, this paper presents a Disengagement-Triggered Contrastive Continual Learning (DTCCL) framework that enables autonomous buses to improve planning policies through real-world operation. Each disengagement triggers cloud-based data augmentation that generates positive and negative samples by perturbing surrounding agents while preserving route context. Contrastive learning refines policy representations to better distinguish safe and unsafe behaviors, and continual updates are applied in a cloud-edge loop without human supervision. Experiments on urban bus routes demonstrate that DTCCL improves overall planning performance by 48.6 percent compared with direct retraining, validating its effectiveness for scalable, closed-loop policy improvement in autonomous public transport.
翻译:自动驾驶公交车在固定线路上运行,但必须在开放、动态的城市环境中作业。这些线路上的脱管事件通常在地理上集中出现,且往往源于高交互区域的规划器故障。此类策略级故障难以通过传统模仿学习进行修正,因为后者极易对稀疏的脱管数据产生过拟合。为解决该问题,本文提出一种脱管触发对比持续学习(DTCCL)框架,使自动驾驶公交车能够通过实际运营持续优化规划策略。每次脱管事件触发基于云端的数据增强机制,通过扰动周围交通参与者(同时保持线路上下文)生成正负样本。对比学习机制优化策略表征以更好区分安全与不安全行为,持续更新通过云-边闭环实现且无需人工干预。在城市公交线路上的实验表明,DTCCL相比直接重训练将整体规划性能提升48.6%,验证了该框架在自动驾驶公共交通系统中实现可扩展闭环策略优化的有效性。