Trajectory sampling in the Frenet(road-aligned) frame, is one of the most popular methods for motion planning of autonomous vehicles. It operates by sampling a set of behavioural inputs, such as lane offset and forward speed, before solving a trajectory optimization problem conditioned on the sampled inputs. The sampling is handcrafted based on simple heuristics, does not adapt to driving scenarios, and is oblivious to the capabilities of downstream trajectory planners. In this paper, we propose an end-to-end learning of behavioural input distribution from expert demonstrations or in a self-supervised manner. Our core novelty lies in embedding a custom differentiable trajectory optimizer as a layer in neural networks, allowing us to update behavioural inputs by considering the optimizer's feedback. Moreover, our end-to-end approach also ensures that the learned behavioural inputs aid the convergence of the optimizer. We improve the state-of-the-art in the following aspects. First, we show that learned behavioural inputs substantially decrease collision rate while improving driving efficiency over handcrafted approaches. Second, our approach outperforms model predictive control methods based on sampling-based optimization.
翻译:在Frenet(道路对齐)坐标系下的轨迹采样是自动驾驶车辆运动规划最常用的方法之一。该方法通过采样一组行为输入(如车道偏移量和前进速度),然后求解基于这些采样输入条件化的轨迹优化问题来实现。传统采样基于简单启发式规则手工设计,既无法适应驾驶场景,也不考虑下游轨迹规划器的能力。本文提出了一种从专家示范或通过自监督方式对行为输入分布进行端到端学习的方法。其核心创新在于将自定义可微轨迹优化器作为神经网络层嵌入,使得我们能够通过优化器的反馈更新行为输入。此外,这种端到端方法还能确保学习到的行为输入有助于优化器的收敛。我们在以下方面改进了现有技术:首先,实验表明,与手工设计方法相比,学习到的行为输入在提升驾驶效率的同时显著降低了碰撞率;其次,我们的方法优于基于采样优化的模型预测控制方法。