In this paper, we study scheduling of a queueing system with zero knowledge of instantaneous network conditions. We consider a one-hop single-server queueing system consisting of $K$ queues, each with time-varying and non-stationary arrival and service rates. Our scheduling approach builds on an innovative combination of adversarial bandit learning and Lyapunov drift minimization, without knowledge of the instantaneous network state (the arrival and service rates) of each queue. We then present two novel algorithms \texttt{SoftMW} (SoftMaxWeight) and \texttt{SSMW} (Sliding-window SoftMaxWeight), both capable of stabilizing systems that can be stablized by some (possibly unknown) sequence of randomized policies whose time-variation satisfies a mild condition. We further generalize our results to the setting where arrivals and departures only have bounded moments instead of being deterministically bounded and propose \texttt{SoftMW+} and \texttt{SSMW+} that are capable of stabilizing the system. As a building block of our new algorithms, we also extend the classical \texttt{EXP3.S} (Auer et al., 2002) algorithm for multi-armed bandits to handle unboundedly large feedback signals, which can be of independent interest.
翻译:本文研究了在瞬时网络条件未知情况下的队列系统调度问题。我们考虑一个包含$K$个队列的单跳单服务器排队系统,每个队列具有时变非平稳的到达和服务速率。我们的调度方法创新性地结合了对抗性赌博机学习与Lyapunov漂移最小化技术,无需知道每个队列的瞬时网络状态(到达和服务速率)。随后提出两种新颖算法\texttt{SoftMW}(SoftMaxWeight)和\texttt{SSMW}(滑动窗口SoftMaxWeight),这两种算法均能稳定那些可由某种(可能未知的)时间变化满足温和条件的随机策略序列所稳定的系统。我们进一步将结果推广至到达和离开仅具有有界矩而非确定性有界的情形,并提出\texttt{SoftMW+}和\texttt{SSMW+}算法来稳定系统。作为新算法的基础模块,我们还扩展了经典的多臂赌博机算法\texttt{EXP3.S}(Auer等人,2002),使其能处理无界大的反馈信号,这一扩展本身也具有独立研究价值。