We study the problem of adapting to a known sub-rational opponent during online play while remaining robust to rational opponents. We focus on large imperfect-information (zero-sum) games, which makes it impossible to inspect the whole game tree at once and necessitates the use of depth-limited search. However, all existing methods assume rational play beyond the depth-limit, which only allows them to adapt a very limited portion of the opponent's behaviour. We propose an algorithm Adapting Beyond Depth-limit (ABD) that uses a strategy-portfolio approach - which we refer to as matrix-valued states - for depth-limited search. This allows the algorithm to fully utilise all information about the opponent model, making it the first robust-adaptation method to be able to do so in large imperfect-information games. As an additional benefit, the use of matrix-valued states makes the algorithm simpler than traditional methods based on optimal value functions. Our experimental results in poker and battleship show that ABD yields more than a twofold increase in utility when facing opponents who make mistakes beyond the depth limit and also delivers significant improvements in utility and safety against randomly generated opponents.
翻译:我们研究了在线对弈过程中如何适应已知的次理性对手,同时保持对理性对手的鲁棒性问题。我们聚焦于大规模非完美信息(零和)博弈,这类博弈无法一次性遍历完整博弈树,必须采用深度受限搜索。然而,现有方法均假设深度限制之外存在理性博弈行为,这导致它们仅能适应对手行为的有限部分。我们提出一种超越深度限制的适应算法,该算法采用策略组合方法——我们称之为矩阵值状态——进行深度受限搜索。这使得算法能够充分利用对手模型的所有信息,成为首个在大型非完美信息博弈中实现此功能的鲁棒适应方法。作为额外优势,矩阵值状态的使用使算法比基于最优值函数的传统方法更为简洁。我们在扑克和战舰博弈中的实验结果表明:当面对在深度限制之外犯错的对手时,ABD算法可获得两倍以上的效用提升;同时针对随机生成的对手,该算法在效用与安全性方面也展现出显著改进。