Model-Free Learning of Optimal Two-Stage Beamformers for Passive IRS-Aided Network Design

Electronically tunable metasurfaces, or Intelligent Reflective Surfaces (IRSs), are a popular technology for achieving high spectral efficiency in modern wireless systems by shaping channels using a multitude of tunable passive reflective elements. Capitalizing on key practical limitations of IRS-aided beamforming pertaining to system modeling and channel sensing/estimation, we propose a novel, fully data-driven Zeroth-order Stochastic Gradient Ascent (ZoSGA) algorithm for general two-stage (i.e., short/long-term), fully-passive IRS-aided stochastic utility maximization. ZoSGA learns long-term optimal IRS beamformers jointly with short-term optimal precoders (e.g., WMMSE-based) via minimal zeroth-order reinforcement and in a strictly model-free fashion, relying solely on the \textit{effective} compound channels observed at the terminals, while being independent of channel models or network/IRS configurations. Another remarkable feature of ZoSGA is being amenable to analysis, enabling us to establish a state-of-the-art (SOTA) convergence rate of the order of $\boldsymbol{O}(\sqrt{S}\epsilon^{-4})$ under minimal assumptions, where $S$ is the total number of IRS elements, and $\epsilon$ is a desired suboptimality target. Our numerical results on a standard MISO downlink IRS-aided sumrate maximization setting establish SOTA empirical behavior of ZoSGA as well, consistently and substantially outperforming standard fully model-based baselines. Lastly, we demonstrate that ZoSGA can in fact operate \textit{in the field}, by directly optimizing the capacitances of a varactor-based electromagnetic IRS model (unknown to ZoSGA) on a multiple user/IRS, compute-heavy network setting, with essentially no computational overheads or performance degradation.

翻译：电子可调超表面，即智能反射面（IRS），是一种通过大量可调无源反射元件塑造信道以实现现代无线系统中高频谱效率的热门技术。基于IRS辅助波束成形在系统建模和信道感知/估计方面的关键实际限制，我们提出了一种新颖的、完全数据驱动的零阶随机梯度上升（ZoSGA）算法，用于一般的两阶段（即短期/长期）、完全无源IRS辅助的随机效用最大化。ZoSGA通过最小化的零阶强化学习，以严格的模型无关方式，联合学习长期最优IRS波束成形和短期最优预编码器（例如基于WMMSE的预编码器），仅依赖终端观测到的\textit{有效}复合信道，而与信道模型或网络/IRS配置无关。ZoSGA的另一个显著特点是其易于分析，使我们能够在最小假设下建立当前最优（SOTA）的收敛速率，阶数为$\boldsymbol{O}(\sqrt{S}\epsilon^{-4})$，其中$S$是IRS元素总数，$\epsilon$是期望的次优性目标。我们在标准MISO下行链路IRS辅助和速率最大化设置上的数值结果同样证实了ZoSGA的SOTA经验性能，一致且显著优于完全基于模型的基准方法。最后，我们证明ZoSGA实际上可以在\textit{现场}运行，通过直接优化变容二极管电磁IRS模型（ZoSGA未知）的电容，在多用户/IRS、计算密集的网络设置中，几乎没有计算开销或性能下降。