To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other jobs. Motivated by applications where prediction models interact with human servers (e.g., content moderation), we consider a large queueing system comprising of many single server queues where the class of a job is estimated using a prediction model. By characterizing the impact of mispredictions on congestion cost in heavy traffic, we design an index-based policy that incorporates the predicted class information in a near-optimal manner. Our theoretical results guide the design of predictive models by providing a simple model selection procedure with downstream queueing performance as a central concern, and offer novel insights on how to design queueing systems with AI-based triage. We illustrate our framework on a content moderation task based on real online comments, where we construct toxicity classifiers by finetuning large language models.
翻译:为了利用预测模型在服务系统中做出最优调度决策,我们必须理解预测误差如何通过外部性对其他任务延迟造成拥塞影响。受预测模型与人工服务者交互(例如内容审核)应用的启发,我们考虑一个由多个单服务器队列组成的大型排队系统,其中任务的类别通过预测模型进行估计。通过刻画误预测在重流量条件下对拥塞成本的影响,我们设计了一种基于指标的策略,以近似最优的方式整合预测类别信息。我们的理论结果为预测模型设计提供了指导,提出了一种以排队性能为核心考量的简单模型选择方法,并针对如何设计基于人工智能的分诊排队系统提出了新颖见解。我们基于真实在线评论的内容审核任务展示了该框架,通过微调大型语言模型构建了毒性分类器。