Fluid antenna system (FAS) becomes a promising paradigm for next-generation wireless networks, which enables position-flexible antenna elements that can dynamically adjust to more favorable channel conditions. However, the optimization of fluid antenna (FA) positions, beamforming, and power allocation in FA-assisted wireless networks is challenging, due to the non-convexity and the lack of base station (BS) coordination. In this paper, we first formulate this challenging optimization problem as a decentralized partially observable Markov decision process, and then propose a multi-agent group relative policy optimization (MAGRPO) algorithm under the centralized training decentralized execution (CTDE) paradigm. Compared with multi-agent proximal policy optimization (MAPPO), MAGRPO replaces the critic network with group relative advantage estimation. This design reduces computational complexity by nearly half under parameter sharing. Furthermore, we derive a variance upper bound of the cumulative reward, which scales with network parameters, e.g., the number of BSs, users, and FAs. Simulation results show that compared with wireless networks with fixed antenna positions, FA-assisted wireless networks achieve multiple-fold sum-rate enhancement. Moreover, the proposed MAGRPO attains sum-rates comparable to those of MAPPO in testing, while reducing training time by $30\% \sim 40\%$.
翻译:流体天线系统(FAS)为下一代无线网络提供了极具前景的范式,其配备位置灵活的单元件天线,可动态调整至更有利的信道条件。然而,由于非凸性及基站间缺乏协调,流体天线位置、波束赋形与功率分配在FA辅助无线网络中的优化极具挑战性。本文首先将该优化问题建模为去中心化部分可观测马尔可夫决策过程,进而提出基于集中式训练去中心化执行范式的多智能体群体相对策略优化算法。相较于多智能体近端策略优化,MAGRPO采用群体相对优势估计替代评论家网络。在参数共享机制下,该设计使计算复杂度降低近50%。此外,我们推导了累积奖励的方差上界,该上界随网络参数(如基站数、用户数与FA数)扩展。仿真结果表明,相较于固定天线位置的无线网络,FA辅助无线网络可实现数倍的和速率提升。同时,所提MAGRPO在测试阶段达到与MAPPO相当的和速率,同时训练时间减少30%-40%。