From Attacks to Curricula: Learnability-Guided Adversarial Training for Safe Autonomous Driving

Closed-loop adversarial training improves autonomous driving safety by exposing policies to rare safety-critical scenarios. Standard pipelines first generate adversarial scenarios and then sample them for policy optimization. However, most existing frameworks remain attack-oriented: collision-driven generators often synthesize unsolvable extreme situations, which can degrade learning, while heuristic samplers ignore the evolving capability of the driving policy, causing sample inefficiency and delayed convergence. We propose AlignADV, a learnability-guided closed-loop adversarial training framework that converts adversarial scenarios into resolvable and capability-aligned curricula. First, we reformulate adversarial scenario generation as a preference alignment problem and employ direct preference optimization to guide the generator toward critical yet resolvable scenarios. Second, we introduce behavioral fingerprints to capture the intrinsic characteristics of the evolving policy and construct a multi-modal capability prediction model that estimates policy performance without expensive closed-loop simulations. By combining resolvability-aligned scenarios with capability predictions, AlignADV develops a dynamic curriculum sampling mechanism that prioritizes scenarios targeting the current policy's vulnerabilities. Experiments on the Waymo Open Motion Dataset demonstrate that AlignADV improves convergence efficiency and final performance, reducing training steps by up to 40.6 percent compared with baseline methods while lowering collision rate and improving route completion under both normal and adversarial traffic conditions. These results highlight a shift from attack-oriented scenario generation to learnability-guided policy improvement, offering a principled direction for safer and more efficient autonomous driving training. Project page: https://meiyuewen.github.io/AlignADV/.

翻译：闭环对抗训练通过将驾驶策略暴露于罕见的安全关键场景中，提升了自动驾驶的安全性。标准流程首先生成对抗场景，然后对其进行采样以进行策略优化。然而，现有框架大多仍以攻击为导向：碰撞驱动的生成器常合成不可求解的极端情况，这会降低学习效果；而启发式采样器忽略了驾驶策略的演变能力，导致采样效率低下和收敛延迟。我们提出AlignADV——一种可学习性引导的闭环对抗训练框架，将对抗场景转化为可解且与能力对齐的课程。首先，我们将对抗场景生成重新表述为偏好对齐问题，并采用直接偏好优化引导生成器聚焦于关键但可解的场景。其次，我们引入行为指纹以捕捉演变策略的内在特征，并构建多模态能力预测模型，在不依赖昂贵闭环仿真的情况下估计策略性能。通过将可解性对齐场景与能力预测相结合，AlignADV开发了一种动态课程采样机制，优先提取针对当前策略脆弱性的场景。在Waymo开放运动数据集上的实验表明，AlignADV提高了收敛效率和最终性能，与基线方法相比训练步数最多减少40.6%，同时在正常和对抗性交通条件下均降低了碰撞率并提升了路线完成度。这些结果凸显了从攻击导向的场景生成向可学习性引导的策略改进的转变，为更安全、更高效的自动驾驶训练提供了原则性方向。项目页面：https://meiyuewen.github.io/AlignADV/。