In this paper, we extend the Descent framework, which enables learning and planning in the context of two-player games with perfect information, to the framework of stochastic games. We propose two ways of doing this, the first way generalizes the search algorithm, i.e. Descent, to stochastic games and the second way approximates stochastic games by deterministic games. We then evaluate them on the game EinStein wurfelt nicht! against state-of-the-art algorithms: Expectiminimax and Polygames (i.e. the Alpha Zero algorithm). It is our generalization of Descent which obtains the best results. The approximation by deterministic games nevertheless obtains good results, presaging that it could give better results in particular contexts.
翻译:本文扩展了Descent框架,将其从完美信息双人博弈中的学习与规划场景延伸至随机博弈领域。我们提出了两种实现路径:第一种是将搜索算法(即Descent)直接泛化到随机博弈中,第二种则是通过确定性博弈对随机博弈进行近似。随后,我们在游戏《EinStein würfelt nicht!》中,将这两种方法与现有最优算法(Expectiminimax和Polygames,即Alpha Zero算法)进行对比评估。实验结果表明,我们提出的Descent泛化方法取得了最佳效果。而采用确定性博弈近似的方法同样表现优异,预示着在特定应用场景中该方案可能获得更优性能。