Hybrid Centralized-Distributed Resource Allocation Based on Deep Reinforcement Learning for Cooperative D2D Communications

Device-to-device (D2D) technology enables direct communication between adjacent devices within cellular networks. Due to its high data rate, low latency, and performance improvement in spectrum and energy efficiency, it has been widely investigated and applied as a critical technology in 5G New Radio (NR). In addition to conventional overlay and underlay D2D communications, cooperative D2D communication, which can achieve a win-win situation between cellular users (CUs) and D2D users (DUs) through cooperative relaying technique, has attracted extensive attention from academic and industrial circles in the past decade. This paper delves into optimizing joint spectrum allocation, power control, and link-matching between multiple CUs and DUs for cooperative D2D communications, using weighted sum energy efficiency (WSEE) as the performance metric to address the challenges of green communication and sustainable development. This integer programming problem can be decomposed into a classic weighted bipartite graph matching and a series of nonconvex spectrum allocation and power control problems between potentially matched cellular and D2D link pairs. To address this issue, we propose a hybrid centralized-distributed scheme based on deep reinforcement learning (DRL) and the Kuhn-Munkres (KM) algorithm. Leveraging the latter, the CUs and DUs autonomously optimize spectrum allocation and power control by only utilizing local information. Then, the base station (BS) determines the link matching. Simulation results reveal that it achieves near-optimal performance and significantly enhances the network convergence speed with low signaling overheads. In addition, we also propose and utilize cooperative link sets for corresponding D2D links to accelerate the proposed scheme and reduce signaling exchange further.

翻译：设备到设备（D2D）技术使得蜂窝网络内相邻设备之间能够直接通信。凭借其高数据速率、低延迟以及在频谱和能量效率方面的性能提升，该技术已被广泛研究并作为5G新空口（NR）中的一项关键技术得到应用。除了传统的覆盖式和底层D2D通信外，协作D2D通信能够通过协作中继技术在蜂窝用户（CUs）与D2D用户（DUs）之间实现双赢，在过去十年中引起了学术界和工业界的广泛关注。本文深入研究了针对协作D2D通信的联合频谱分配、功率控制以及多个CUs与DUs之间的链路匹配优化问题，采用加权和能量效率（WSEE）作为性能指标，以应对绿色通信和可持续发展的挑战。该整数规划问题可分解为一个经典的加权二分图匹配问题，以及一系列潜在匹配的蜂窝与D2D链路对之间的非凸频谱分配和功率控制问题。为解决此问题，我们提出了一种基于深度强化学习（DRL）和Kuhn-Munkres（KM）算法的混合集中-分布式方案。利用后者，CUs和DUs仅通过利用本地信息自主优化频谱分配和功率控制。随后，基站（BS）确定链路匹配。仿真结果表明，该方案实现了接近最优的性能，并以较低的信令开销显著提升了网络收敛速度。此外，我们还提出并利用协作链路集来加速所提方案，并进一步减少信令交换。