Vehicular Ad-hoc Networks (VANETs) are the digital cornerstone of autonomous driving, yet they suffer from severe network fragmentation in urban environments due to physical obstructions. Unmanned Aerial Vehicles (UAVs), with their high mobility, have emerged as a vital solution to bridge these connectivity gaps. However, traditional Deep Reinforcement Learning (DRL)-based UAV deployment strategies lack semantic understanding of road topology, often resulting in blind exploration and sample inefficiency. By contrast, Large Language Models (LLMs) possess powerful reasoning capabilities capable of identifying topological importance, though applying them to control tasks remains challenging. To address this, we propose the Semantic-Augmented DRL (SA-DRL) framework. Firstly, we propose a fragmentation quantification method based on Road Topology Graphs (RTG) and Dual Connected Graphs (DCG). Subsequently, we design a four-stage pipeline to transform a general-purpose LLM into a domain-specific topology expert. Finally, we propose the Semantic-Augmented PPO (SA-PPO) algorithm, which employs a Logit Fusion mechanism to inject the LLM's semantic reasoning directly into the policy as a prior, effectively guiding the agent toward critical intersections. Extensive high-fidelity simulations demonstrate that SA-PPO achieves state-of-the-art performance with remarkable efficiency, reaching baseline performance levels using only 26.6% of the training episodes. Ultimately, SA-PPO improves two key connectivity metrics by 13.2% and 23.5% over competing methods, while reducing energy consumption to just 28.2% of the baseline.
翻译:车载自组织网络(VANETs)是自动驾驶的数字化基石,但在城市环境中因物理遮挡而面临严重的网络碎片化问题。高机动性的无人机(UAVs)已成为弥合这些连接缺口的关键解决方案。然而,传统的基于深度强化学习(DRL)的无人机部署策略缺乏对道路拓扑结构的语义理解,常导致盲目探索与样本效率低下。相比之下,大型语言模型(LLMs)具备强大的推理能力,能够识别拓扑重要性,但将其应用于控制任务仍具挑战。为此,我们提出语义增强DRL(SA-DRL)框架。首先,我们提出基于道路拓扑图(RTG)与对偶连通图(DCG)的碎片化量化方法。随后,设计四阶段流水线将通用LLM转化为领域专用拓扑专家。最后,提出语义增强PPO(SA-PPO)算法,通过Logit融合机制将LLM的语义推理作为先验直接注入策略,有效引导智能体向关键交叉口移动。大量高保真仿真表明,SA-PPO以显著效率达到最优性能——仅用26.6%的训练回合即实现基线性能水平。最终,相较于竞争方法,SA-PPO在两项关键连接性指标上分别提升13.2%和23.5%,同时将能耗降至基线的28.2%。