The constrained combinatorial multi-armed bandit model has been widely employed to solve problems in wireless networking and related areas, including the problem of wireless scheduling for throughput optimization under unknown channel conditions. Most work in this area uses an algorithm design strategy that combines a bandit learning algorithm with the virtual queue technique to track the throughput constraint violation. These algorithms seek to minimize the virtual queue length in their algorithm design. However, in networks where channel conditions change abruptly, the resulting constraints may become infeasible, leading to unbounded growth in virtual queue lengths. In this paper, we make the key observation that the dynamics of the head-of-line age, i.e. the age of the oldest packet in the virtual queue, make it more robust when used in algorithm design compared to the virtual queue length. We therefore design a learning-based scheduling policy that uses the head-of-line age in place of the virtual queue length. We show that our policy matches state-of-the-art performance under i.i.d. network conditions. Crucially, we also show that the system remains stable even under abrupt changes in channel conditions and can rapidly recover from periods of constraint infeasibility.
翻译:约束组合多臂老虎机模型已被广泛应用于解决无线网络及相关领域的问题,包括未知信道条件下吞吐量优化的无线调度问题。该领域大多数研究采用将老虎机学习算法与虚拟队列技术相结合的设计策略,以追踪吞吐量约束违反情况。这些算法在设计时致力于最小化虚拟队列长度。然而,在信道条件突变网络中,所得约束可能变得不可行,导致虚拟队列长度无限增长。本文提出关键观察:队首年龄(即虚拟队列中最旧数据包的年龄)的动态特性使其在算法设计中比虚拟队列长度更具鲁棒性。因此,我们设计了一种基于学习的调度策略,使用队首年龄替代虚拟队列长度。我们证明该策略在独立同分布网络条件下能达到最优性能。更重要的是,我们证明即使在信道条件突变时系统仍能保持稳定,并能从约束不可行期快速恢复。