Queueing systems are widely applicable stochastic models with use cases in communication networks, healthcare, service systems, etc. Although their optimal control has been extensively studied, most existing approaches assume perfect knowledge of system parameters. Of course, this assumption rarely holds in practice where there is parameter uncertainty, thus motivating a recent line of work on bandit learning for queueing systems. This nascent stream of research focuses on the asymptotic performance of the proposed algorithms. In this paper, we argue that an asymptotic metric, which focuses on late-stage performance, is insufficient to capture the intrinsic statistical complexity of learning in queueing systems which typically occurs in the early stage. Instead, we propose the Cost of Learning in Queueing (CLQ), a new metric that quantifies the maximum increase in time-averaged queue length caused by parameter uncertainty. We characterize the CLQ of a single-queue multi-server system, and then extend these results to multi-queue multi-server systems and networks of queues. In establishing our results, we propose a unified analysis framework for CLQ that bridges Lyapunov and bandit analysis, which could be of independent interest.
翻译:排队系统是广泛适用的随机模型,应用于通信网络、医疗服务、服务系统等领域。尽管其最优控制问题已被深入研究,但现有方法大多假设系统参数完全已知。显然,在实际应用中参数不确定性普遍存在,这一假设难以成立,进而催生了近期关于排队系统赌博机学习的研究热潮。这个新兴研究方向主要关注所提出算法的渐近性能。本文认为,聚焦于后期性能的渐近度量不足以刻画排队系统学习过程中通常发生在早期阶段的固有统计复杂度。为此,我们提出排队学习代价(CLQ)这一新指标,用于量化参数不确定性导致的排队长度时间平均值最大增幅。我们首先刻画了单队列多服务器系统的CLQ特性,随后将结果推广至多队列多服务器系统及队列网络。在建立结论的过程中,我们提出了融合李雅普诺夫分析与赌博机分析的CLQ统一分析框架,该框架可能具有独立的研究价值。