End-to-End Autonomous Driving (E2E-AD) systems are typically grouped by the nature of their outputs: (i) waypoint-based models that predict a future trajectory, and (ii) action-based models that directly output throttle, steer and brake. Most recent benchmark protocols and training pipelines are waypoint-based, which makes action-based policies harder to train and compare, slowing their progress. To bridge this waypoint-action gap, we propose a novel, differentiable vehicle-model framework that rolls out predicted action sequences to their corresponding ego-frame waypoint trajectories while supervising in waypoint space. Our approach enables action-based architectures to be trained and evaluated, for the first time, within waypoint-based benchmarks without modifying the underlying evaluation protocol. We extensively evaluate our framework across multiple challenging benchmarks and observe consistent improvements over the baselines. In particular, on NAVSIM \texttt{navhard} our approach achieves state-of-the-art performance. Our code will be made publicly available upon acceptance.
翻译:端到端自动驾驶系统通常按其输出性质分为两类:(i) 基于航点的模型,预测未来轨迹;(ii) 基于动作的模型,直接输出油门、转向和制动指令。当前大多数基准测试协议与训练流程均基于航点设计,这使得基于动作的策略难以训练与比较,从而阻碍了其发展。为弥合这一航点-动作鸿沟,我们提出了一种新颖的可微分车辆模型框架,该框架将预测的动作序列展开为对应的自车坐标系航点轨迹,并在航点空间中进行监督。我们的方法首次使得基于动作的架构能够在无需修改底层评估协议的情况下,在基于航点的基准测试中进行训练与评估。我们在多个具有挑战性的基准测试上对该框架进行了广泛评估,并观察到相较于基线模型的持续性能提升。特别是在NAVSIM \texttt{navhard} 数据集上,我们的方法实现了最先进的性能。我们的代码将在论文录用后公开。