Automated parking is a challenging operational domain for advanced driver assistance systems, requiring robust scene understanding and interaction reasoning. The key challenge is twofold: (i) predict multiple plausible ego intentions according to context and (ii) for each intention, predict the joint responses of surrounding agents, enabling effective what-if decision-making. However, existing methods often fall short, typically treating these interdependent problems in isolation. We propose ParkDiffusion++, which jointly learns a multi-modal ego intention predictor and an ego-conditioned multi-agent joint trajectory predictor for automated parking. Our approach makes several key contributions. First, we introduce an ego intention tokenizer that predicts a small set of discrete endpoint intentions from agent histories and vectorized map polylines. Second, we perform ego-intention-conditioned joint prediction, yielding socially consistent predictions of the surrounding agents for each possible ego intention. Third, we employ a lightweight safety-guided denoiser with different constraints to refine joint scenes during training, thus improving accuracy and safety. Fourth, we propose counterfactual knowledge distillation, where an EMA teacher refined by a frozen safety-guided denoiser provides pseudo-targets that capture how agents react to alternative ego intentions. Extensive evaluations demonstrate that ParkDiffusion++ achieves state-of-the-art performance on the Dragon Lake Parking (DLP) dataset and the Intersections Drone (inD) dataset. Importantly, qualitative what-if visualizations show that other agents react appropriately to different ego intentions.
翻译:自动泊车是高级驾驶辅助系统面临的一个具有挑战性的运行领域,需要鲁棒的场景理解与交互推理。其核心挑战在于双重性:(i) 根据上下文预测多种合理的自车意图;(ii) 针对每种意图,预测周围智能体的联合响应,从而实现有效的假设性决策。然而,现有方法通常存在不足,往往将这两个相互依赖的问题孤立处理。我们提出了ParkDiffusion++,该方法联合学习一个多模态自车意图预测器和一个自车条件化的多智能体联合轨迹预测器,用于自动泊车。我们的方法做出了若干关键贡献。首先,我们引入了一个自车意图分词器,它从智能体历史轨迹和矢量化地图折线中预测一小组离散的终点意图。其次,我们执行自车意图条件化的联合预测,针对每个可能的自车意图,生成周围智能体在社交层面一致的预测轨迹。第三,我们采用一个轻量级的安全引导去噪器,该去噪器配备不同的约束条件,在训练过程中对联合场景进行细化,从而提升准确性与安全性。第四,我们提出了反事实知识蒸馏,其中由一个经冻结的安全引导去噪器优化的EMA教师模型提供伪目标,这些伪目标捕捉了智能体对不同自车替代意图的反应。广泛的评估表明,ParkDiffusion++在Dragon Lake Parking (DLP) 数据集和Intersections Drone (inD) 数据集上均达到了最先进的性能。重要的是,定性的假设性可视化结果表明,其他智能体能够针对不同的自车意图做出恰当的反应。