Fluid antenna system (FAS) leverages dynamic reconfigurability to unlock spatial degrees of freedom and reshape wireless channels. This paper proposes, for the first time, a robust fluid antenna-driven blind interference alignment (BIA) framework for a K-user MISO downlink under imperfect channel state information (CSI). We formulate a robust sum-rate maximization problem through optimizing fluid antenna positions. To solve this challenging non-convex problem, we employ group relative policy optimization (GRPO), a novel deep reinforcement learning algorithm that eliminates the critic network. This robust design reduces model size and floating point operations (FLOPs) by nearly half compared to proximal policy optimization (PPO) while significantly enhancing performance through group-based exploration that escapes bad local optima. Simulation results demonstrate that GRPO outperforms PPO by 4.17%, and a 100K-step pre-trained PPO by 30.29%. Due to error distribution learning, GRPO exceeds heuristic MaximumGain and RandomGain by 200.78% and 465.38%, respectively.
翻译:流体天线系统(FAS)利用动态可重构性来解锁空间自由度并重塑无线信道。本文首次提出了一种针对不完美信道状态信息(CSI)下K用户MISO下行链路的鲁棒流体天线驱动盲干扰对齐(BIA)框架。我们通过优化流体天线位置,构建了一个鲁棒的和速率最大化问题。为了解决这一具有挑战性的非凸问题,我们采用了组相对策略优化(GRPO),这是一种无需评论家网络的新型深度强化学习算法。与近端策略优化(PPO)相比,这种鲁棒设计将模型大小和浮点运算(FLOPs)减少了近一半,同时通过基于组的探索来逃离不良局部最优解,从而显著提升了性能。仿真结果表明,GRPO的性能优于PPO 4.17%,优于经过100K步预训练的PPO 30.29%。由于误差分布学习,GRPO的性能分别超过启发式MaximumGain和RandomGain方法200.78%和465.38%。