In this paper, we present a framework for enabling autonomous vehicles to interact with cyclists in a manner that balances safety and optimality. The approach integrates Hamilton-Jacobi reachability analysis with deep Q-learning to jointly address safety guarantees and time-efficient navigation. A value function is computed as the solution to a time-dependent Hamilton-Jacobi-Bellman inequality, providing a quantitative measure of safety for each system state. This safety metric is incorporated as a structured reward signal within a reinforcement learning framework. The method further models the cyclist's latent response to the vehicle, allowing disturbance inputs to reflect human comfort and behavioral adaptation. The proposed framework is evaluated through simulation and comparison with human driving behavior and an existing state-of-the-art method.
翻译:本文提出一种使自动驾驶车辆能够以兼顾安全性与最优性的方式与自行车骑行者进行交互的框架。该方法将哈密顿-雅可比可达性分析与深度Q学习相结合,以协同处理安全保证与时间效率优化问题。通过求解时变哈密顿-雅可比-贝尔曼不等式得到值函数,为每个系统状态提供安全性的量化度量。该安全度量作为结构化奖励信号被整合到强化学习框架中。该方法进一步建模了骑行者对车辆的潜在响应,使扰动输入能够反映人类舒适度与行为适应性。通过仿真实验,并与人类驾驶行为及现有先进方法进行对比,对所提框架进行了评估。