Balancing safety, efficiency, and operational costs in highway driving poses a challenging decision-making problem for heavy-duty vehicles. A central difficulty is that conventional scalar reward formulations, obtained by aggregating these competing objectives, often obscure the structure of their trade-offs. We present a Proximal Policy Optimization based multi-objective reinforcement learning framework that learns a continuous set of policies explicitly representing these trade-offs and evaluates it on a scalable simulation platform for tactical decision making in trucks. The proposed approach learns a continuous set of Pareto-optimal policies that capture the trade-offs among three conflicting objectives: safety, quantified in terms of collisions and successful completion; energy efficiency and time efficiency, quantified using energy cost and driver cost, respectively. The resulting Pareto frontier is smooth and interpretable, enabling flexibility in choosing driving behavior along different conflicting objectives. This framework allows seamless transitions between different driving policies without retraining, yielding a robust and adaptive decision-making strategy for autonomous trucking applications.
翻译:在高速公路驾驶中平衡安全性、效率与运营成本,对重型车辆构成了一项具有挑战性的决策问题。一个核心难点在于,通过聚合这些相互冲突的目标所得到的传统标量奖励函数,往往会模糊其权衡关系的结构。我们提出了一种基于近端策略优化的多目标强化学习框架,该框架学习一个明确表示这些权衡关系的连续策略集,并在一个可扩展的卡车战术决策仿真平台上对其进行了评估。所提出的方法学习到一个连续的帕累托最优策略集,该策略集捕捉了三个相互冲突目标之间的权衡:安全性(以碰撞次数和任务成功完成度量化)、能源效率以及时间效率(分别使用能源成本和驾驶员成本量化)。所得的帕累托前沿是平滑且可解释的,使得能够灵活地沿不同冲突目标选择驾驶行为。该框架允许在不同驾驶策略之间无缝切换而无需重新训练,从而为自动驾驶卡车应用提供了一种鲁棒且自适应的决策策略。