Urban railway systems increasingly rely on communication based train control (CBTC) systems, where optimal deployment of access points (APs) in tunnels is critical for robust wireless coverage. Traditional methods, such as empirical model-based optimization algorithms, are hindered by excessive measurement requirements and suboptimal solutions, while machine learning (ML) approaches often struggle with complex tunnel environments. This paper proposes a deep reinforcement learning (DRL) driven framework that integrates parabolic wave equation (PWE) channel modeling, conditional generative adversarial network (cGAN) based data augmentation, and a dueling deep Q network (Dueling DQN) for AP placement optimization. The PWE method generates high-fidelity path loss distributions for a subset of AP positions, which are then expanded by the cGAN to create high resolution path loss maps for all candidate positions, significantly reducing simulation costs while maintaining physical accuracy. In the DRL framework, the state space captures AP positions and coverage, the action space defines AP adjustments, and the reward function encourages signal improvement while penalizing deployment costs. The dueling DQN enhances convergence speed and exploration exploitation balance, increasing the likelihood of reaching optimal configurations. Comparative experiments show that the proposed method outperforms a conventional Hooke Jeeves optimizer and traditional DQN, delivering AP configurations with higher average received power, better worst-case coverage, and improved computational efficiency. This work integrates high-fidelity electromagnetic simulation, generative modeling, and AI-driven optimization, offering a scalable and data-efficient solution for next-generation CBTC systems in complex tunnel environments.
翻译:城市轨道交通系统日益依赖通信式列车控制(CBTC)系统,其中隧道内接入点(AP)的优化部署对于实现稳健的无线覆盖至关重要。传统方法,如基于经验模型的优化算法,受限于过高的测量需求和次优解,而机器学习(ML)方法则往往难以应对复杂的隧道环境。本文提出一种深度强化学习(DRL)驱动的框架,该框架集成了抛物线波方程(PWE)信道建模、基于条件生成对抗网络(cGAN)的数据增强以及用于AP布局优化的决斗深度Q网络(Dueling DQN)。PWE方法为一部分AP位置生成高保真路径损耗分布,随后通过cGAN进行扩展,为所有候选位置创建高分辨率路径损耗图,从而在保持物理准确性的同时显著降低了仿真成本。在DRL框架中,状态空间捕获AP位置与覆盖情况,动作空间定义AP调整,而奖励函数则激励信号改善并惩罚部署成本。Dueling DQN提升了收敛速度与探索-利用平衡,增加了达到最优配置的可能性。对比实验表明,所提方法优于传统的Hooke Jeeves优化器和传统DQN,能够提供具有更高平均接收功率、更优最坏情况覆盖以及更高计算效率的AP配置方案。本工作融合了高保真电磁仿真、生成式建模与人工智能驱动的优化,为复杂隧道环境中的下一代CBTC系统提供了一种可扩展且数据高效的解决方案。