This study addresses the challenge of optimal power allocation in stochastic wireless networks by employing a Deep Reinforcement Learning (DRL) framework. Specifically, we design a Deep Q-Network (DQN) agent capable of learning adaptive power control policies directly from channel state observations, effectively bypassing the need for explicit system models. We formulate the resource allocation problem as a Markov Decision Process (MDP) and benchmark the proposed approach against classical heuristics, including fixed allocation, random assignment, and the theoretical water-filling algorithm. Empirical results demonstrate that the DQN agent achieves a system throughput of 3.88 Mbps, effectively matching the upper limit of the water fill, while outperforming the random and fixed allocation strategies by approximately 73% and 27%, respectively. Moreover, the agent exhibits emergent fairness, maintaining a Jain's Index of 0.91, and successfully optimizes the trade-off between spectral efficiency and energy consumption. These findings substantiate the efficacy of model-free DRL as a robust and scalable solution for resource management in next-generation communication systems.
翻译:本研究通过采用深度强化学习框架,解决了随机无线网络中功率最优分配的难题。具体而言,我们设计了一个深度Q网络智能体,能够直接从信道状态观测中学习自适应功率控制策略,从而有效绕过了对显式系统模型的需求。我们将资源分配问题建模为马尔可夫决策过程,并将所提出的方法与经典启发式算法(包括固定分配、随机分配以及理论上的注水算法)进行了基准测试。实验结果表明,深度Q网络智能体实现了3.88 Mbps的系统吞吐量,有效匹配了注水算法的理论上限,同时分别以约73%和27%的优势超越了随机分配与固定分配策略。此外,该智能体展现出良好的公平性,维持了0.91的Jain指数,并成功优化了频谱效率与能耗之间的权衡关系。这些发现证实了无模型深度强化学习作为下一代通信系统中资源管理的一种鲁棒且可扩展解决方案的有效性。