A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

from arxiv, Added theoretical results to show that Q* search is an admissible search algorithm. Added comparisons to deferred heuristic evaluation. Added experiments with Lights Out and the 35-Pancake puzzle

Efficiently solving problems with large action spaces using A* search has been of importance to the artificial intelligence community for decades. This is because the computation and memory requirements of A* search grow linearly with the size of the action space. This burden becomes even more apparent when A* search uses a heuristic function learned by computationally expensive function approximators, such as deep neural networks. To address this problem, we introduce Q* search, a search algorithm that uses deep Q-networks to guide search in order to take advantage of the fact that the sum of the transition costs and heuristic values of the children of a node can be computed with a single forward pass through a deep Q-network without explicitly generating those children. This significantly reduces computation time and requires only one node to be generated per iteration. We use Q* search to solve the Rubik's cube when formulated with a large action space that includes 1872 meta-actions and find that this 157-fold increase in the size of the action space incurs less than a 4-fold increase in computation time and less than a 3-fold increase in number of nodes generated when performing Q* search. Furthermore, Q* search is up to 129 times faster and generates up to 1288 times fewer nodes than A* search. Finally, although obtaining admissible heuristic functions from deep neural networks is an ongoing area of research, we prove that Q* search is guaranteed to find a shortest path given a heuristic function that neither overestimates the cost of a shortest path nor underestimates the transition cost.

翻译：数十年来，如何高效使用A*搜索解决大规模动作空间问题一直是人工智能领域的研究重点。这是因为A*搜索的计算与内存需求随动作空间规模线性增长。当A*搜索采用深度神经网络等高计算代价的函数逼近器学习启发式函数时，这一负担尤为突出。为解决该问题，我们提出Q*搜索算法——一种利用深度Q网络引导搜索的算法。该算法通过单个前向传播即可计算节点子代的转移代价与启发式值之和，无需显式生成这些子代节点。这显著降低了计算时间，且每次迭代仅需生成一个节点。我们将Q*搜索应用于包含1872种元动作的大规模动作空间魔方求解问题，发现当动作空间规模扩大157倍时，Q*搜索的计算时间增量小于4倍，生成的节点数增量小于3倍。此外，相较于A*搜索，Q*搜索速度最高提升129倍，生成的节点数最多减少1288倍。尽管从深度神经网络获取可采纳启发式函数仍是当前研究热点，但我们证明：若启发式函数既不低估转移代价也不高估最短路径成本，Q*搜索可保证找到最短路径。