Fluid antenna system (FAS) leverages dynamic reconfigurability to unlock spatial degrees of freedom and reshape wireless channels. Blind interference alignment (BIA) aligns interference through antenna switching. This paper proposes, for the first time, a robust fluid antenna-driven BIA framework for a K-user MISO downlink under imperfect channel state information (CSI). We formulate a robust sum-rate maximization problem through optimizing fluid antenna positions (switching positions). To solve this challenging non-convex problem, we employ group relative policy optimization (GRPO), a novel deep reinforcement learning algorithm that eliminates the critic network. This robust design reduces model size and floating point operations (FLOPs) by nearly half compared to proximal policy optimization (PPO) while significantly enhancing performance through group-based exploration that escapes bad local optima. Simulation results demonstrate that GRPO outperforms PPO by 4.17%, and a 100K-step pre-trained PPO by 30.29%. Due to error distribution learning, GRPO exceeds heuristic MaximumGain and RandomGain by 200.78% and 465.38%, respectively.
翻译:流体天线系统(FAS)利用动态可重构性解锁空间自由度并重塑无线信道。盲干扰对齐(BIA)通过天线切换实现干扰对齐。本文首次针对不完美信道状态信息(CSI)下的K用户MISO下行链路,提出了一种鲁棒的流体天线驱动BIA框架。我们通过优化流体天线位置(切换位置)构建了一个鲁棒的和速率最大化问题。为解决这一具有挑战性的非凸问题,我们采用了群体相对策略优化(GRPO),这是一种无需评论家网络的新型深度强化学习算法。该鲁棒设计相较于近端策略优化(PPO),模型大小和浮点运算(FLOPs)减少了近一半,同时通过基于群体的探索逃离不良局部最优解,显著提升了性能。仿真结果表明,GRPO的性能优于PPO 4.17%,优于经过10万步预训练的PPO 30.29%。得益于误差分布学习,GRPO分别超过启发式MaximumGain和RandomGain算法200.78%和465.38%。