This paper considers a two-player game where each player chooses a resource from a finite collection of options. Each resource brings a random reward. Both players have statistical information regarding the rewards of each resource. Additionally, there exists an information asymmetry where each player has knowledge of the reward realizations of different subsets of the resources. If both players choose the same resource, the reward is divided equally between them, whereas if they choose different resources, each player gains the full reward of the resource. We first implement the iterative best response algorithm to find an $\epsilon$-approximate Nash equilibrium for this game. This method of finding a Nash equilibrium may not be desirable when players do not trust each other and place no assumptions on the incentives of the opponent. To handle this case, we solve the problem of maximizing the worst-case expected utility of the first player. The solution leads to counter-intuitive insights in certain special cases. To solve the general version of the problem, we develop an efficient algorithmic solution that combines online convex optimization and the drift-plus penalty technique.
翻译:本文考虑一个双人博弈,其中每位玩家从有限选项集合中选择一种资源。每种资源带来随机收益。两位玩家均掌握各资源的统计信息。此外,存在信息不对称:每位玩家了解不同资源子集的收益实现值。若两位玩家选择相同资源,则收益均分;若选择不同资源,则每位玩家获得该资源的全部收益。我们首先采用迭代最优响应算法寻找该博弈的$\epsilon$-近似纳什均衡。当玩家互不信任且不对对手激励做任何假设时,这种纳什均衡求解方法可能不理想。为此,我们求解第一个玩家最坏情况期望效用最大化问题。在某些特殊情形下,该解引出了反直觉的见解。为求解一般版本问题,我们开发了一种结合在线凸优化与漂移加罚技术的高效算法方案。