Delayed and partially observable state information poses significant challenges for reinforcement learning (RL)-based control in real-world autonomous driving. In highway on-ramp merging, a roadside unit (RSU) can sense nearby traffic, perform edge perception, and transmit state estimates to the ego vehicle over vehicle-to-infrastructure (V2I) links. With recent advancements in intelligent transportation infrastructure and edge computing, such RSU-assisted perception is increasingly realistic and already deployed in modern connected roadway systems. However, edge processing time and wireless transmission can introduce stochastic V2I communication delays, violating the Markov assumption and substantially degrading control performance. In this work, we propose DAROM, a Delay-Aware Reinforcement Learning framework for On-ramp Merging that is robust to stochastic delays. We model the problem as a random delay Markov decision process (RDMDP) and develop a unified RL agent for joint longitudinal and lateral control. To recover a Markovian representation under delayed observations, we introduce a Delay-Aware Encoder that conditions on delayed observations, masked action histories, and observed delay magnitude to infer the current latent state. We further integrate a physics-based safety controller to reduce collision risk during merging. Experiments in the Simulation of Urban MObility (SUMO) simulator using real-world traffic data from the Next Generation Simulation (NGSIM) dataset demonstrate that DAROM consistently outperforms standard RL baselines across traffic densities. In particular, the gated recurrent unit (GRU)-based encoder achieves over 99% success in high-density traffic with random V2I delays of up to 2.0 seconds.
翻译:延迟且部分可观测的状态信息为现实世界自动驾驶中基于强化学习(RL)的控制带来了重大挑战。在高速公路匝道汇入场景中,路侧单元(RSU)可以感知附近交通、执行边缘感知,并通过车路协同(V2I)链路将状态估计传输给主车。随着智能交通基础设施和边缘计算的最新进展,此类RSU辅助感知正日益成为现实,并已部署于现代网联道路系统中。然而,边缘处理时间和无线传输会引入随机的V2I通信延迟,这破坏了马尔可夫假设,并显著降低了控制性能。在本工作中,我们提出了DAROM,一种针对随机延迟具有鲁棒性的、用于匝道汇入的延迟感知强化学习框架。我们将该问题建模为一个随机延迟马尔可夫决策过程(RDMDP),并开发了一个用于联合纵向与横向控制的统一RL智能体。为了在延迟观测下恢复马尔可夫表示,我们引入了一个延迟感知编码器,该编码器以延迟观测、被屏蔽的动作历史以及观测到的延迟幅度为条件,来推断当前的潜在状态。我们进一步集成了一个基于物理的安全控制器,以降低汇入过程中的碰撞风险。在使用下一代仿真(NGSIM)数据集中的真实交通数据、基于SUMO仿真器进行的实验中,DAROM在不同交通密度下均持续优于标准RL基线方法。特别地,基于门控循环单元(GRU)的编码器在高达2.0秒的随机V2I延迟下,于高密度交通中实现了超过99%的成功率。