Motivated by applications such as machine repair, project monitoring, and anti-poaching patrol scheduling, we study intervention planning of stochastic processes under resource constraints. This planning problem has previously been modeled as restless multi-armed bandits (RMAB), where each arm is an intervention-dependent Markov Decision Process. However, the existing literature assumes all intervention resources belong to a single uniform pool, limiting their applicability to real-world settings where interventions are carried out by a set of workers, each with their own costs, budgets, and intervention effects. In this work, we consider a novel RMAB setting, called multi-worker restless bandits (MWRMAB) with heterogeneous workers. The goal is to plan an intervention schedule that maximizes the expected reward while satisfying budget constraints on each worker as well as fairness in terms of the load assigned to each worker. Our contributions are two-fold: (1) we provide a multi-worker extension of the Whittle index to tackle heterogeneous costs and per-worker budget and (2) we develop an index-based scheduling policy to achieve fairness. Further, we evaluate our method on various cost structures and show that our method significantly outperforms other baselines in terms of fairness without sacrificing much in reward accumulated.
翻译:受机器维修、项目监控和反偷猎巡逻调度等应用驱动,我们研究了资源约束下随机过程的干预规划问题。该规划问题此前被建模为不宁多臂老虎机(RMAB),其中每个臂是一个依赖于干预的马尔可夫决策过程。然而,现有文献假设所有干预资源属于单一统一池,限制了其在现实场景中的应用性,因为现实中的干预由一组劳动者执行,每个劳动者具有各自的成本、预算和干预效果。在这项工作中,我们考虑一种新的RMAB设置,称为多劳动者不宁老虎机(MWRMAB),其中包含异质性劳动者。目标是规划一个干预调度,在满足每位劳动者预算约束以及分配负载公平性的同时,最大化期望奖励。我们的贡献有两方面:(1)我们提供了Whittle指数的多劳动者扩展,以应对异质成本和每位劳动者的预算;(2)我们开发了一种基于指数的调度策略来实现公平性。此外,我们基于各种成本结构评估了该方法,证明其在公平性方面显著优于其他基线方法,且不会大幅牺牲累积奖励。