In this paper, we investigate the problem of fast spectrum sharing in vehicle-to-everything communication. In order to improve the spectrum efficiency of the whole system, the spectrum of vehicle-to-infrastructure links is reused by vehicle-to-vehicle links. To this end, we model it as a problem of deep reinforcement learning and tackle it with proximal policy optimization. A considerable number of interactions are often required for training an agent with good performance, so simulation-based training is commonly used in communication networks. Nevertheless, severe performance degradation may occur when the agent is directly deployed in the real world, even though it can perform well on the simulator, due to the reality gap between the simulation and the real environments. To address this issue, we make preliminary efforts by proposing an algorithm based on meta reinforcement learning. This algorithm enables the agent to rapidly adapt to a new task with the knowledge extracted from similar tasks, leading to fewer interactions and less training time. Numerical results show that our method achieves near-optimal performance and exhibits rapid convergence.
翻译:本文研究了车联网通信中快速频谱共享的问题。为提升整个系统的频谱效率,本文将车辆与基础设施链路的频谱复用于车辆与车辆链路。为此,我们将该问题建模为深度强化学习问题,并采用近端策略优化算法进行求解。由于训练一个性能良好的智能体通常需要大量交互,基于仿真的训练在通信网络中较为常见。然而,尽管智能体在仿真器中表现优异,但由于仿真环境与真实环境之间存在现实差距,直接将其部署到真实场景中可能导致严重的性能下降。为解决这一问题,我们初步提出了一种基于元强化学习的算法。该算法能够从相似任务中提取知识,使智能体快速适应新任务,从而减少交互次数并缩短训练时间。数值结果表明,所提方法能够实现接近最优的性能,并表现出快速收敛的特性。