This paper considers a multi-player resource-sharing game with a fair reward allocation model. Multiple players choose from a collection of resources. Each resource brings a random reward equally divided among the players who choose it. We consider two settings. The first setting is a one-slot game where the mean rewards of the resources are known to all the players, and the objective of player 1 is to maximize their worst-case expected utility. Certain special cases of this setting have explicit solutions. These cases provide interesting yet non-intuitive insights into the problem. The second setting is an online setting, where the game is played over a finite time horizon, where the mean rewards are unknown to the first player. Instead, the first player receives, as feedback, the rewards of the resources they chose after the action. We develop a novel Upper Confidence Bound (UCB) algorithm that minimizes the worst-case regret of the first player using the feedback received.
翻译:本文研究了一种具有公平奖励分配模型的多参与者资源共享博弈。多个参与者从一组资源中进行选择。每个资源带来的随机奖励由选择该资源的参与者平均分配。我们考虑两种情形。第一种情形是单轮博弈,其中所有参与者已知各资源的平均奖励,参与者1的目标是最大化其最坏情况下的期望效用。该情形下的某些特例存在显式解,这些特例为问题提供了有趣但反直觉的见解。第二种情形是在线博弈,博弈在有限的时间范围内进行,此时第一参与者未知各资源的平均奖励。相反,第一参与者仅能在行动后获得所选资源的奖励作为反馈。我们开发了一种新颖的上置信界(UCB)算法,利用反馈信息最小化第一参与者的最坏情况遗憾。