This paper considers a two-player game where each player chooses a resource from a finite collection of options. Each resource brings a random reward. Both players have statistical information regarding the rewards of each resource. Additionally, there exists an information asymmetry where each player has knowledge of the reward realizations of different subsets of the resources. If both players choose the same resource, the reward is divided equally between them, whereas if they choose different resources, each player gains the full reward of the resource. We first implement the iterative best response algorithm to find an $\epsilon$-approximate Nash equilibrium for this game. This method of finding a Nash equilibrium may not be desirable when players do not trust each other and place no assumptions on the incentives of the opponent. To handle this case, we solve the problem of maximizing the worst-case expected utility of the first player. The solution leads to counter-intuitive insights in certain special cases. To solve the general version of the problem, we develop an efficient algorithmic solution that combines online-convex optimization and the drift-plus penalty technique.
翻译:本文研究一个双人博弈,其中每位玩家从有限选项集合中选择一个资源。每个资源带来随机奖励。两位玩家均掌握各资源奖励的统计信息。此外,存在信息不对称:每位玩家了解不同资源子集的奖励实现值。若两位玩家选择同一资源,则奖励平分;若选择不同资源,则每位玩家获得该资源的全部奖励。我们首先采用迭代最优响应算法来寻找该博弈的$\epsilon$-近似纳什均衡。当玩家互不信任且对对手的激励不作任何假设时,这种寻找纳什均衡的方法可能不可取。为处理该情形,我们解决了最大化第一位玩家最差情形期望效用的问题。该结论在某些特殊情形下引出了反直觉的洞见。为解决该问题的一般版本,我们开发了一种结合在线凸优化与漂移加罚技术的有效算法方案。