To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other jobs. Motivated by applications where prediction models interact with human servers (e.g., content moderation), we consider a large queueing system comprising of many single server queues where the class of a job is estimated using a prediction model. By characterizing the impact of mispredictions on congestion cost in heavy traffic, we design an index-based policy that incorporates the predicted class information in a near-optimal manner. Our theoretical results guide the design of predictive models by providing a simple model selection procedure with downstream queueing performance as a central concern, and offer novel insights on how to design queueing systems with AI-based triage. We illustrate our framework on a content moderation task based on real online comments, where we construct toxicity classifiers by finetuning large language models.
翻译:为在服务系统中利用预测模型做出最优调度决策,我们必须理解预测误差如何通过影响其他任务的延迟外部性而导致系统拥塞。受预测模型与人工服务者交互的应用场景(如内容审核)启发,我们研究了一个由多个单服务器队列组成的大型排队系统,其中任务类别通过预测模型进行估计。通过刻画重负载条件下误预测对拥塞成本的影响,我们设计了一种基于索引的策略,以前瞻最优的方式整合预测的类别信息。我们的理论结果为预测模型的设计提供了指导:提出了一种以下游排队性能为核心考量的简易模型选择流程,并为设计基于人工智能分诊的排队系统提供了新思路。我们在基于真实在线评论的内容审核任务中验证了该框架,通过微调大语言模型构建了毒性分类器。