The coexistence of NR-U and Wi-Fi in unlicensed spectrum introduces a system-level resource coordination problem, where heterogeneous channel access mechanisms lead to a significant imbalance in spectrum utilization and degraded Wi-Fi performance. To address this challenge, we propose a policy-driven deep reinforcement learning (DRL) framework for adaptive TXOP control, in which the coexistence process is formulated as a Markov decision process (MDP) and a deep Q-network (DQN) learns control policies through online interaction. A key contribution is the introduction of a policy layer via reward design, enabling explicit control of system-level tradeoffs among fairness, throughput, and quality of service (QoS). Three policies, namely absolute fairness, moderate fairness, and utility-based fairness, are developed to achieve different operating points. Simulation results show that the proposed framework achieves a Jain fairness index above 0.9 under strict fairness control. Compared to absolute fairness, moderate fairness improves aggregate throughput by 68.22%, while the utility-based policy further enhances utility by 177.6%. These results demonstrate that policy-driven control provides a flexible and effective solution for managing tradeoffs in heterogeneous coexistence networks.
翻译:非授权频段上NR-U与Wi-Fi的共存引发了系统级资源协调问题,由于异构信道接入机制导致频谱利用率严重失衡并降低Wi-Fi性能。为应对这一挑战,我们提出了一种策略驱动的深度强化学习(DRL)框架用于自适应TXOP控制,其中共存过程被建模为马尔可夫决策过程(MDP),并通过深度Q网络(DQN)在线交互学习控制策略。核心贡献在于通过奖励设计引入策略层,实现对公平性、吞吐量和服务质量(QoS)等系统级权衡的显式控制。我们开发了三种策略——绝对公平、适度公平和基于效用的公平——以实现不同的运行点。仿真结果表明,在严格公平控制下,所提框架的Jain公平指数高于0.9。与绝对公平相比,适度公平使总吞吐量提升68.22%,而基于效用的策略进一步将效用提升177.6%。这些结果证明,策略驱动控制为管理异构共存网络中的权衡提供了灵活有效的解决方案。