Current autonomous driving systems heavily rely on V2X communication data to enhance situational awareness and the cooperation between vehicles. However, a major challenge when using V2X data is that it may not be available periodically because of unpredictable delays and data loss during wireless transmission between road stations and the receiver vehicle. This issue should be considered when designing control strategies for connected and autonomous vehicles. Therefore, this paper proposes a novel 'Blind Actor-Critic' algorithm that guarantees robust driving performance in V2X environment with delayed and/or lost data. The novel algorithm incorporates three key mechanisms: a virtual fixed sampling period, a combination of Temporal-Difference and Monte Carlo learning, and a numerical approximation of immediate reward values. To address the temporal aperiodicity problem of V2X data, we first illustrate this challenge. Then, we provide a detailed explanation of the Blind Actor-Critic algorithm where we highlight the proposed components to compensate for the temporal aperiodicity problem of V2X data. We evaluate the performance of our algorithm in a simulation environment and compare it to benchmark approaches. The results demonstrate that training metrics are improved compared to conventional actor-critic algorithms. Additionally, testing results show that our approach provides robust control, even under low V2X network reliability levels.
翻译:当前自动驾驶系统高度依赖V2X通信数据来增强环境感知与车辆协同能力。然而,使用V2X数据时面临重大挑战:由于路侧单元与接收车辆间的无线传输面临不可预测的延迟与数据丢失,导致数据无法周期性获取。设计网联自动驾驶车辆的控制策略时必须考虑该问题。为此,本文提出一种新型"盲Actor-Critic"算法,能够在存在延时/数据丢失的V2X环境中保证鲁棒驾驶性能。该算法包含三个关键机制:虚拟固定采样周期、时序差分与蒙特卡洛学习的结合、即时奖励值的数值近似。针对V2X数据时间非周期性问题,我们首先阐明该挑战,随后详细阐述盲Actor-Critic算法,重点说明为补偿V2X数据时间非周期性而提出的组件。我们在仿真环境中评估算法性能,并与基准方法进行对比。结果表明,与常规Actor-Critic算法相比,训练指标得到改善。此外,测试结果显示,即使在V2X网络可靠性较低的情况下,该方法仍能提供鲁棒控制。