This study explores a next-generation multiple access (NGMA) framework for cell-free massive MIMO (CF-mMIMO) systems enhanced by stacked intelligent metasurfaces (SIMs), aiming to improve simultaneous wireless information and power transfer (SWIPT) performance. A fundamental challenge lies in optimally selecting the operating modes of access points (APs) to jointly maximize the received energy and satisfy spectral efficiency (SE) quality-of-service constraints. Practical system impairments, including a non-linear harvested energy model, pilot contamination (PC), channel estimation errors, and reliance on long-term statistical channel state information (CSI), are considered. We derive closed-form expressions for both the achievable SE and the average sum harvested energy (sum-HE). A mixed-integer non-convex optimization problem is formulated to jointly optimize the SIM phase shifts, APs mode selection, and power allocation to maximize average sum-HE under SE and average harvested energy constraints. To solve this problem, we propose a centralized training, decentralized execution (CTDE) framework based on deep reinforcement learning (DRL), which efficiently handles high-dimensional decision spaces. A Markovian environment and a normalized joint reward function are introduced to enhance the training stability across on-policy and off-policy DRL algorithms. Additionally, we provide a two-phase convex-based solution as a theoretical robust performance. Numerical results demonstrate that the proposed DRL-based CTDE framework achieves SWIPT performance comparable to convexification-based solution, while significantly outperforming baselines.
翻译:本研究探索了一种由堆叠智能超表面(SIM)增强的无蜂窝大规模多输入多输出(CF-mMIMO)系统下的下一代多址接入(NGMA)框架,旨在提升无线信息与功率同传(SWIPT)性能。一个核心挑战在于优化选择接入点(AP)的工作模式,以在满足频谱效率(SE)服务质量约束的同时,联合最大化接收能量。研究考虑了实际系统损伤,包括非线性能量收集模型、导频污染(PC)、信道估计误差以及对长期统计信道状态信息(CSI)的依赖。我们推导了可达SE和平均总收集能量(sum-HE)的闭式表达式。构建了一个混合整数非凸优化问题,以联合优化SIM相移、AP模式选择和功率分配,从而在SE和平均收集能量约束下最大化平均总收集能量。为解决此问题,我们提出了一种基于深度强化学习(DRL)的集中训练分散执行(CTDE)框架,该框架能高效处理高维决策空间。通过引入马尔可夫环境和归一化联合奖励函数,增强了策略内与策略外DRL算法的训练稳定性。此外,我们提供了一种基于凸优化的两阶段解决方案作为理论上的鲁棒性能基准。数值结果表明,所提出的基于DRL的CTDE框架实现的SWIPT性能与基于凸优化的解决方案相当,同时显著优于基线方法。