Spatial State-Action Features for General Games

In many board games and other abstract games, patterns have been used as features that can guide automated game-playing agents. Such patterns or features often represent particular configurations of pieces, empty positions, etc., which may be relevant for a game's strategies. Their use has been particularly prevalent in the game of Go, but also many other games used as benchmarks for AI research. In this paper, we formulate a design and efficient implementation of spatial state-action features for general games. These are patterns that can be trained to incentivise or disincentivise actions based on whether or not they match variables of the state in a local area around action variables. We provide extensive details on several design and implementation choices, with a primary focus on achieving a high degree of generality to support a wide variety of different games using different board geometries or other graphs. Secondly, we propose an efficient approach for evaluating active features for any given set of features. In this approach, we take inspiration from heuristics used in problems such as SAT to optimise the order in which parts of patterns are matched and prune unnecessary evaluations. This approach is defined for a highly general and abstract description of the problem -- phrased as optimising the order in which propositions of formulas in disjunctive normal form are evaluated -- and may therefore also be of interest to other types of problems than board games. An empirical evaluation on 33 distinct games in the Ludii general game system demonstrates the efficiency of this approach in comparison to a naive baseline, as well as a baseline based on prefix trees, and demonstrates that the additional efficiency significantly improves the playing strength of agents using the features to guide search.

翻译：在许多棋盘游戏和其他抽象游戏中，模式已被用作特征来引导自动化游戏智能体。这些模式或特征通常表示棋子、空位等特定配置，可能对游戏策略具有相关性。它们尤其广泛应用于围棋，以及许多用于人工智能研究基准的其他游戏。本文针对通用游戏，提出了一种空间状态-动作特征的设计与高效实现方法。这些模式可被训练，根据动作变量周围局部区域内的状态变量匹配情况，激励或抑制特定动作。我们详细阐述了设计与实现的多种选择，主要目标是在支持不同棋盘几何结构或其他图形的多种游戏时实现高度通用性。其次，我们提出了一种用于评估任意给定特征集下活动特征的高效方法。该方法借鉴了SAT等问题的启发式优化思路，以优化模式匹配部分的顺序并减少不必要的计算。这一方法针对高度通用且抽象的数学描述（即优化析取范式公式中各命题的求值顺序）而定义，因此可能对除了棋盘游戏以外的其他问题类型同样具有参考价值。在Ludii通用游戏系统上的33种不同游戏上进行的实证评估表明，该方法相比朴素基线方法和基于前缀树的基线方法均具有更高的效率，并且这种额外效率显著提升了使用特征引导搜索的智能体的博弈强度。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日