This paper introduces a deep reinforcement learning-based block coordinate descent (DRL-based BCD) algorithm to address the nonconvex weighted sum-rate maximization (WSRM) problem with a total power constraint. Firstly, we present an efficient block coordinate descent (BCD) method to solve the problem. We then integrate deep reinforcement learning (DRL) techniques into the BCD method and propose the DRL-based BCD algorithm. This approach combines the data-driven learning capability of machine learning techniques with the navigational and decision-making characteristics of the optimization-theoretic-based BCD method. This combination significantly improves the algorithm's performance by reducing its sensitivity to initial points and mitigating the risk of entrapment in local optima. The primary advantages of the proposed DRL-based BCD algorithm lie in its ability to adhere to the constraints of the WSRM problem and significantly enhance accuracy, potentially achieving the exact optimal solution. Moreover, unlike many pure machine-learning approaches, the DRL-based BCD algorithm capitalizes on the underlying theoretical analysis of the WSRM problem's structure. This enables it to be easily trained and computationally efficient while maintaining a level of interpretability. Through numerical experiments, the DRL-based BCD algorithm demonstrates substantial advantages in effectiveness, efficiency, robustness, and interpretability for maximizing sum rates, which also provides valuable potential for designing resource-constrained AI-native wireless optimization strategies in next-generation wireless networks.
翻译:本文提出了一种基于深度强化学习的块坐标下降(DRL-based BCD)算法,以解决具有总功率约束的非凸加权和速率最大化(WSRM)问题。首先,我们提出了一种高效的块坐标下降(BCD)方法来求解该问题。随后,我们将深度强化学习(DRL)技术融入BCD方法中,提出了基于DRL的BCD算法。该方法结合了机器学习技术的数据驱动学习能力与基于优化理论的BCD方法的导航与决策特性。这种结合显著提升了算法的性能,降低了其对初始点的敏感性,并减少了陷入局部最优的风险。所提出的基于DRL的BCD算法的主要优势在于其能够遵循WSRM问题的约束条件,并显著提高求解精度,有可能达到精确的最优解。此外,与许多纯机器学习方法不同,基于DRL的BCD算法充分利用了WSRM问题结构的底层理论分析。这使得该算法易于训练、计算高效,同时保持了一定的可解释性。通过数值实验,基于DRL的BCD算法在最大化加权和速率方面展现出有效性、效率、鲁棒性和可解释性上的显著优势,这为设计下一代无线网络中资源受限的AI原生无线优化策略提供了宝贵的潜力。