We study the problem of adapting to a known sub-rational opponent during online play while remaining robust to rational opponents. We focus on large imperfect-information (zero-sum) games, which makes it impossible to inspect the whole game tree at once and necessitates the use of depth-limited search. However, all existing methods assume rational play beyond the depth-limit, which only allows them to adapt a very limited portion of the opponent's behaviour. We propose an algorithm Adapting Beyond Depth-limit (ABD) that uses a strategy-portfolio approach - which we refer to as matrix-valued states - for depth-limited search. This allows the algorithm to fully utilise all information about the opponent model, making it the first robust-adaptation method to be able to do so in large imperfect-information games. As an additional benefit, the use of matrix-valued states makes the algorithm simpler than traditional methods based on optimal value functions. Our experimental results in poker and battleship show that ABD yields more than a twofold increase in utility when facing opponents who make mistakes beyond the depth limit and also delivers significant improvements in utility and safety against randomly generated opponents.
翻译:我们研究在在线对局中适应已知次理性对手的同时保持对理性对手鲁棒性的问题。我们聚焦于大型不完全信息(零和)博弈,这类博弈无法一次性遍历整个博弈树,必须采用深度受限搜索。然而,现有方法均假设深度限制之外的理性博弈行为,这导致它们仅能适应对手行为的有限部分。我们提出一种算法——超越深度限制适应算法,该算法采用策略组合方法(我们称之为矩阵值状态)进行深度受限搜索。这使得算法能够充分利用对手模型的所有信息,成为首个在大型不完全信息博弈中实现此功能的鲁棒适应方法。作为额外优势,矩阵值状态的使用使算法比基于最优值函数的传统方法更为简洁。我们在扑克和战舰博弈中的实验结果表明:面对在深度限制外犯错的对手时,ABD算法可获得两倍以上的效用提升;同时,在对抗随机生成对手时,该算法在效用与安全性方面也表现出显著改进。