Limited look-ahead game solving for imperfect-information games is the breakthrough that allowed defeating expert humans in large poker. The existing algorithms of this type assume that all players are perfectly rational and do not allow explicit modeling and exploitation of the opponent's flaws. As a result, even very weak opponents can tie or lose only very slowly against these powerful methods. We present the first algorithm that allows incorporating opponent models into limited look-ahead game solving. Using only an approximation of a single (optimal) value function, the algorithm efficiently exploits an arbitrary estimate of the opponent's strategy. It guarantees a bounded worst-case loss for the player. We also show that using existing resolving gadgets is problematic and why we need to keep the previously solved parts of the game. Experiments on three different games show that over half of the maximum possible exploitation is achieved by our algorithm without risking almost any loss.
翻译:不完全信息博弈的有限前瞻求解是使机器在大型扑克中击败人类专家的突破性技术。现有此类算法假设所有玩家完全理性,不允许显式建模和利用对手的缺陷。因此,即便面对极弱对手,这些强力方法也只能缓慢缩小劣势或避免速败。我们提出首个允许在有限前瞻博弈求解中融入对手模型的算法。该算法仅需单个(最优)价值函数的近似值,即可高效利用对手策略的任意估计值,同时保证玩家最坏情况下的损失有界。我们还揭示了现有解析构件存在的问题,并论证了保留游戏已求解部分的必要性。在三个不同游戏上的实验表明:我们的算法在不承担几乎任何损失风险的前提下,实现了超过最大可能剥削量一半的收益。