Queueing systems are widely applicable stochastic models with use cases in communication networks, healthcare, service systems, etc. Although their optimal control has been extensively studied, most existing approaches assume perfect knowledge of the system parameters. Of course, this assumption rarely holds in practice where there is parameter uncertainty, thus motivating a recent line of work on bandit learning for queueing systems. This nascent stream of research focuses on the asymptotic performance of the proposed algorithms. In this paper, we argue that an asymptotic metric, which focuses on late-stage performance, is insufficient to capture the intrinsic statistical complexity of learning in queueing systems which typically occurs in the early stage. Instead, we propose the Cost of Learning in Queueing (CLQ), a new metric that quantifies the maximum increase in time-averaged queue length caused by parameter uncertainty. We characterize the CLQ of a single queue multi-server system, and then extend these results to multi-queue multi-server systems and networks of queues. In establishing our results, we propose a unified analysis framework for CLQ that bridges Lyapunov and bandit analysis, provides guarantees for a wide range of algorithms, and could be of independent interest.
翻译:排队系统是具有广泛应用场景的随机模型,覆盖通信网络、医疗服务及服务系统等领域。尽管其最优控制策略已得到广泛研究,但现有方法大多假设系统参数完全已知。然而在实际应用中,参数不确定性普遍存在,这促使近年来将赌博机学习应用于排队系统的研究方向兴起。这一新兴研究领域主要聚焦于所提算法的渐近性能。本文认为,侧重后期性能表现的渐近指标难以充分刻画排队系统中通常发生在早期阶段的内在统计学习复杂度。为此,我们提出排队学习成本(CLQ)这一新指标,用于量化参数不确定性所导致的时间平均队列长度最大增长量。我们首先刻画单队列多服务器系统的CLQ,随后将结论推广至多队列多服务器系统及队列网络。在构建研究成果的过程中,我们提出整合李雅普诺夫分析与赌博机分析的CLQ统一分析框架,该框架可为各类算法提供性能保障,并具有独立研究价值。