将决策整合到可微分优化引导的学习中实现自动驾驶车辆的端到端规划 (Integrating Decision-Making Into Differentiable Optimization Guided Learning for End-to-End Planning of Autonomous Vehicles)

We address the decision-making capability within an end-to-end planning framework that focuses on motion prediction, decision-making, and trajectory planning. Specifically, we formulate decision-making and trajectory planning as a differentiable nonlinear optimization problem, which ensures compatibility with learning-based modules to establish an end-to-end trainable architecture. This optimization introduces explicit objectives related to safety, traveling efficiency, and riding comfort, guiding the learning process in our proposed pipeline. Intrinsic constraints resulting from the decision-making task are integrated into the optimization formulation and preserved throughout the learning process. By integrating the differentiable optimizer with a neural network predictor, the proposed framework is end-to-end trainable, aligning various driving tasks with ultimate performance goals defined by the optimization objectives. The proposed framework is trained and validated using the Waymo Open Motion dataset. The open-loop testing reveals that while the planning outcomes using our method do not always resemble the expert trajectory, they consistently outperform baseline approaches with improved safety, traveling efficiency, and riding comfort. The closed-loop testing further demonstrates the effectiveness of optimizing decisions and improving driving performance. Ablation studies demonstrate that the initialization provided by the learning-based prediction module is essential for the convergence of the optimizer as well as the overall driving performance.

翻译：我们针对一个专注于运动预测、决策制定和轨迹规划的端到端规划框架，探讨其决策能力。具体而言，我们将决策制定和轨迹规划表述为一个可微分的非线性优化问题，这确保了与基于学习的模块的兼容性，从而构建一个端到端可训练的架构。该优化引入了与安全性、行驶效率和乘坐舒适性相关的显式目标，以指导我们提出的流程中的学习过程。由决策任务产生的内在约束被整合到优化公式中，并在整个学习过程中得以保持。通过将可微分优化器与神经网络预测器相结合，所提出的框架实现了端到端可训练，使各项驾驶任务与由优化目标定义的最终性能目标保持一致。该框架使用Waymo Open Motion数据集进行训练和验证。开环测试表明，虽然使用我们方法的规划结果并不总是与专家轨迹相似，但其在安全性、行驶效率和乘坐舒适性方面始终优于基线方法。闭环测试进一步证明了优化决策和提升驾驶性能的有效性。消融研究表明，基于学习的预测模块提供的初始化对于优化器的收敛以及整体驾驶性能至关重要。