We address a control system optimization problem that arises in multi-class, multi-server queueing system scheduling with uncertainty. In this scenario, jobs incur holding costs while awaiting completion, and job-server assignments yield observable stochastic rewards with unknown mean values. The rewards for job-server assignments are assumed to follow a bilinear model with respect to features characterizing jobs and servers. Our objective is regret minimization, aiming to maximize the cumulative reward of job-server assignments over a time horizon while maintaining a bounded total job holding cost, thus ensuring queueing system stability. This problem is motivated by applications in computing services and online platforms. To address this problem, we propose a scheduling algorithm based on weighted proportional fair allocation criteria augmented with marginal costs for reward maximization, incorporating a bandit strategy. Our algorithm achieves sub-linear regret and sub-linear mean holding cost (and queue length bound) with respect to the time horizon, thus guaranteeing queueing system stability. Additionally, we establish stability conditions for distributed iterative algorithms for computing allocations, which are relevant to large-scale system applications. Finally, we validate the efficiency of our algorithm through numerical experiments.
翻译:我们研究了一个在多类别、多服务器排队系统调度中存在不确定性的控制系统优化问题。在此场景中,作业在等待完成期间会产生持有成本,而作业-服务器分配会产生可观测的、均值未知的随机奖励。作业-服务器分配的奖励被假定遵循一个关于作业和服务器特征的双线性模型。我们的目标是最小化遗憾,旨在时间范围内最大化作业-服务器分配的累积奖励,同时保持有界的作业总持有成本,从而确保排队系统的稳定性。该问题受到计算服务和在线平台应用的启发。为解决此问题,我们提出了一种基于加权比例公平分配准则的调度算法,该准则通过引入边际成本以最大化奖励,并结合了赌博机策略。我们的算法实现了相对于时间范围的次线性遗憾和次线性平均持有成本(以及队列长度界限),从而保证了排队系统的稳定性。此外,我们为计算分配的分布式迭代算法建立了稳定性条件,这些条件与大规模系统应用相关。最后,我们通过数值实验验证了算法的效率。