We study the sequential decision-making problem of allocating a limited resource to agents that reveal their stochastic demands on arrival over a finite horizon. Our goal is to design fair allocation algorithms that exhaust the available resource budget. This is challenging in sequential settings where information on future demands is not available at the time of decision-making. We formulate the problem as a discrete time Markov decision process (MDP). We propose a new algorithm, SAFFE, that makes fair allocations with respect to the entire demands revealed over the horizon by accounting for expected future demands at each arrival time. The algorithm introduces regularization which enables the prioritization of current revealed demands over future potential demands depending on the uncertainty in agents' future demands. Using the MDP formulation, we show that SAFFE optimizes allocations based on an upper bound on the Nash Social Welfare fairness objective, and we bound its gap to optimality with the use of concentration bounds on total future demands. Using synthetic and real data, we compare the performance of SAFFE against existing approaches and a reinforcement learning policy trained on the MDP. We show that SAFFE leads to more fair and efficient allocations and achieves close-to-optimal performance in settings with dense arrivals.
翻译:我们研究在有限时间范围内,向按到达顺序揭示随机需求的智能体分配有限资源的序列决策问题。我们的目标是设计能够耗尽可用资源预算的公平分配算法。在决策时无法获取未来需求信息的序列场景中,这具有挑战性。我们将该问题建模为离散时间马尔可夫决策过程(MDP)。我们提出一种新算法SAFFE,该算法通过在每个到达时刻考虑预期未来需求,针对时间范围内揭示的全部需求实现公平分配。该算法引入正则化机制,能够根据智能体未来需求的不确定性,优先处理当前揭示需求而非潜在未来需求。基于MDP公式,我们证明SAFFE依据纳什社会福利公平目标的上界优化分配,并通过未来总需求的集中界来约束其与最优解的差距。利用合成数据与实际数据,我们将SAFFE的性能与现有方法及基于MDP训练的强化学习策略进行对比。结果表明,SAFFE能够实现更公平高效的分配,在密集到达场景中达到接近最优的性能。