This paper considers a two-player game where each player chooses a resource from a finite collection of options without knowing the opponent's choice in the absence of any form of feedback. Each resource brings a random reward. Both players have statistical information regarding the rewards of each resource. Additionally, there exists an information asymmetry where each player has knowledge of the reward realizations of different subsets of the resources. If both players choose the same resource, the reward is divided equally between them, whereas if they choose different resources, each player gains the full reward of the resource. We first implement the iterative best response algorithm to find an $\epsilon$-approximate Nash equilibrium for this game. This method of finding a Nash equilibrium is impractical when players do not trust each other and place no assumptions on the incentives of the opponent. To handle this case, we solve the problem of maximizing the worst-case expected utility of the first player. The solution leads to counter-intuitive insights in certain special cases. To solve the general version of the problem, we develop an efficient algorithmic solution that combines online-convex optimization and the drift-plus penalty technique.
翻译:本文研究一个两玩家博弈,其中每个玩家从有限选项集合中选择一种资源,且在没有任何反馈形式的情况下不知道对手的选择。每种资源带来随机回报。两位玩家都掌握每种资源回报的统计信息。此外,存在信息非对称性,每位玩家知晓不同资源子集的回报实现值。若两位玩家选择相同资源,则回报平均分配;若选择不同资源,则每位玩家获得该资源的全部回报。我们首先采用迭代最优响应算法来寻找该博弈的ε-近似纳什均衡。当玩家互不信任且不对对手的激励做任何假设时,寻找纳什均衡的方法并不可行。针对这种情况,我们求解最大化第一位玩家最坏情况期望效用的问题。该解在某些特殊情况下产生了反直觉的见解。为解决该问题的一般版本,我们开发了一种结合在线凸优化与漂移加惩罚技术的有效算法解决方案。