Multi-User mmWave Beam and Rate Adaptation via Combinatorial Satisficing Bandits

We study downlink beam and rate adaptation in a multi-user mmWave MISO system where multiple base stations (BSs), each using analog beamforming from finite codebooks, serve multiple single-antenna user equipments (UEs) with a unique beam per UE and discrete data transmission rates. BSs learn about transmission success based on ACK/NACK feedback. To encode service goals, we introduce a satisficing throughput threshold $τ_r$ and cast joint beam and rate adaptation as a combinatorial semi-bandit over beam-rate tuples. Within this framework, we propose SAT-CTS, a lightweight, threshold-aware policy that blends conservative confidence estimates with posterior sampling, steering learning toward meeting $τ_r$ rather than merely maximizing. Our main theoretical contribution provides the first finite-time regret bounds for combinatorial semi-bandits with satisficing objective: when $τ_r$ is realizable, we upper bound the cumulative satisficing regret to the target with a time-independent constant, and when $τ_r$ is non-realizable, we show that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which its regret is governed by the sum of the regret contributions of restarted CTS rounds, yielding an $O((\log T)^2)$ standard regret bound. On the practical side, we evaluate the performance via cumulative satisficing regret to $τ_r$ alongside standard regret and fairness. Experiments with time-varying sparse multipath channels show that SAT-CTS consistently reduces satisficing regret and maintains competitive standard regret, while achieving favorable average throughput and fairness across users, indicating that feedback-efficient learning can equitably allocate beams and rates to meet QoS targets without channel state knowledge.

翻译：我们研究了下行多用户毫米波MISO系统中的波束与速率自适应问题，其中多个基站（BS）采用有限码本的模拟波束赋形，以每个用户设备（UE）独立波束和离散数据传输速率为特点，服务于多个单天线用户设备。基站根据ACK/NACK反馈学习传输成功与否。为编码服务目标，我们引入满意度吞吐量阈值$\tau_r$，并将联合波束与速率自适应建模为波束-速率元组上的组合半基带问题。在此框架下，我们提出SAT-CTS算法——一种轻量级阈值感知策略，融合保守置信度估计与后验采样，引导学习过程聚焦于满足$\tau_r$而非单纯最大化吞吐量。理论核心贡献在于首次给出组合半基带在满意度目标下的有限时间遗憾界：当$\tau_r$可实现时，目标累积满意度遗憾由与时间无关的常数上界限定；当$\tau_r$不可实现时，SAT-CTS在提交的CTS轮次外仅产生有限期望暂态，此后其标准遗憾由重启CTS轮次贡献的总和主导，达到$O((\log T)^2)$量级。实验部分，我们通过累积满意度遗憾（相对于$\tau_r$）、标准遗憾及公平性指标评估性能。在时变稀疏多径信道仿真中，SAT-CTS持续降低满意度遗憾并保持竞争力标准遗憾，同时实现良好的用户平均吞吐量与公平性，表明该反馈高效学习方法可在无信道状态信息条件下实现波束与速率的均衡分配以满足服务质量目标。