We study online fair allocation of $T$ sequentially arriving items among $n$ agents with heterogeneous preferences, with the objective of maximizing generalized-mean welfare, defined as the $p$-mean of agents' time-averaged utilities, with $p\in (-\infty, 1)$. We first consider the i.i.d. arrival model and show that the pure greedy algorithm -- which myopically chooses the welfare-maximizing integral allocation -- achieves $\widetilde{O}(1/T)$ average regret. Importantly, in contrast to prior work, our algorithm does not require distributional knowledge and achieves the optimal regret rate using only the online samples. We then go beyond i.i.d. arrivals and investigate a nonstationary model with time-varying independent distributions. In the absence of additional data about the distributions, it is known that every online algorithm must suffer $Ω(1)$ average regret. We show that only a single historical sample from each distribution is sufficient to recover the optimal $\widetilde{O}(1/T)$ average regret rate, even in the face of arbitrary non-stationarity. Our algorithms are based on the re-solving paradigm: they assume that the remaining items will be the ones seen historically in those periods and solve the resulting welfare-maximization problem to determine the decision in every period. Finally, we also account for distribution shifts that may distort the fidelity of historical samples and show that the performance of our re-solving algorithms is robust to such shifts.
翻译:我们研究在具有异质偏好的 $n$ 个智能体之间在线公平分配 $T$ 个顺序到达的物品,目标是最大化广义均值福利,其定义为智能体时间平均效用的 $p$ 均值,其中 $p\in (-\infty, 1)$。我们首先考虑独立同分布到达模型,并证明纯贪心算法——该算法短视地选择福利最大化的整数分配——能够实现 $\widetilde{O}(1/T)$ 的平均遗憾。重要的是,与先前工作相比,我们的算法不需要分布知识,仅利用在线样本即可达到最优遗憾率。随后,我们超越独立同分布到达,研究了一个具有时变独立分布的非平稳模型。在缺乏关于分布的额外数据的情况下,已知任何在线算法都必须承受 $Ω(1)$ 的平均遗憾。我们证明,即使面对任意的非平稳性,每个分布仅需一个历史样本就足以恢复最优的 $\widetilde{O}(1/T)$ 平均遗憾率。我们的算法基于重求解范式:它们假设剩余物品将是那些时期历史上观察到的物品,并通过求解由此产生的福利最大化问题来确定每个时期的决策。最后,我们还考虑了可能扭曲历史样本保真度的分布偏移,并证明了我们的重求解算法对此类偏移具有鲁棒性。