Fluid antenna system (FAS) leverages dynamic reconfigurability to unlock spatial degrees of freedom and reshape wireless channels. Blind interference alignment (BIA) aligns interference through antenna switching. This paper proposes, for the first time, a robust fluid antenna-driven BIA framework for a K-user MISO downlink under imperfect channel state information (CSI). We formulate a robust sum-rate maximization problem through optimizing fluid antenna positions (switching positions). To solve this challenging non-convex problem, we employ group relative policy optimization (GRPO), a novel deep reinforcement learning algorithm that eliminates the critic network. This robust design reduces model size and floating point operations (FLOPs) by nearly half compared to proximal policy optimization (PPO) while significantly enhancing performance through group-based exploration that escapes bad local optima. Simulation results demonstrate that GRPO outperforms PPO by 4.17%, and a 100K-step pre-trained PPO by 30.29%. Due to error distribution learning, GRPO exceeds heuristic MaximumGain and RandomGain by 200.78% and 465.38%, respectively.
翻译:流体天线系统(FAS)利用动态可重构性解锁空间自由度并重塑无线信道。盲干扰对齐(BIA)通过天线切换实现干扰对齐。本文首次针对不完全信道状态信息(CSI)下的K用户MISO下行链路,提出了鲁棒流体天线驱动的BIA框架。通过优化流体天线位置(切换位置),我们构建了鲁棒和速率最大化问题。为求解这一挑战性非凸问题,我们采用组相对策略优化(GRPO)——一种消除评论家网络的新型深度强化学习算法。该鲁棒设计将模型尺寸和浮点运算次数(FLOPs)较近端策略优化(PPO)减少近半,同时通过基于组的探索机制摆脱较差局部最优解,显著提升性能。仿真结果表明:GRPO比PPO性能提升4.17%,比经过10万步预训练的PPO提升30.29%;得益于误差分布学习,GRPO较启发式算法MaximumGain和RandomGain分别提升200.78%和465.38%。