Optimizing large-scale wireless networks, including optimal resource management, power allocation, and throughput maximization, is inherently challenging due to their non-observable system dynamics and heterogeneous and complex nature. Herein, a novel ensemble Q-learning algorithm that addresses the performance and complexity challenges of the traditional Q-learning algorithm for optimizing wireless networks is presented. Ensemble learning with synthetic Markov Decision Processes is tailored to wireless networks via new models for approximating large state-space observable wireless networks. In particular, digital cousins are proposed as an extension of the traditional digital twin concept wherein multiple Q-learning algorithms on multiple synthetic Markovian environments are run in parallel and their outputs are fused into a single Q-function. Convergence analyses of key statistics and Q-functions and derivations of upper bounds on the estimation bias and variance are provided. Numerical results across a variety of real-world wireless networks show that the proposed algorithm can achieve up to 50% less average policy error with up to 40% less runtime complexity than the state-of-the-art reinforcement learning algorithms. It is also shown that theoretical results properly predict trends in the experimental results.
翻译:优化大规模无线网络(包括最优资源管理、功率分配和吞吐量最大化)因其不可观测的系统动态特性以及异构复杂的本质而极具挑战性。本文提出了一种新型集成Q学习算法,旨在解决传统Q学习算法在优化无线网络时面临的性能与复杂度挑战。我们通过为近似大规模状态空间可观测无线网络设计新模型,将基于合成马尔可夫决策过程的集成学习定制于无线网络。具体而言,我们提出数字孪生表亲作为传统数字孪生概念的扩展方案:在多个合成马尔可夫环境中并行运行多个Q学习算法,并将其输出融合为单一Q值函数。本文提供了关键统计量与Q值函数的收敛性分析,以及估计偏差与方差上界的推导结果。在多种真实无线网络中的数值实验表明,与当前最优的强化学习算法相比,所提算法可实现平均策略误差降低高达50%,同时运行时复杂度降低高达40%。实验还证实理论结果能够准确预测实验结果的变化趋势。