We study the problem of learning good heuristic functions for classical planning tasks with neural networks based on samples represented by states with their cost-to-goal estimates. The heuristic function is learned for a state space and goal condition with the number of samples limited to a fraction of the size of the state space, and must generalize well for all states of the state space with the same goal condition. Our main goal is to better understand the influence of sample generation strategies on the performance of a greedy best-first heuristic search (GBFS) guided by a learned heuristic function. In a set of controlled experiments, we find that two main factors determine the quality of the learned heuristic: which states are included in the sample set and the quality of the cost-to-goal estimates. These two factors are dependent: having perfect cost-to-goal estimates is insufficient if the samples are not well distributed across the state space. We also study other effects, such as adding samples with high-value estimates. Based on our findings, we propose practical strategies to improve the quality of learned heuristics: three strategies that aim to generate more representative states and two strategies that improve the cost-to-goal estimates. Our practical strategies almost double the mean coverage of a GBFS algorithm guided by a learned heuristic.
翻译:我们研究利用神经网络基于以状态及其到目标成本估计为表示的样本来学习经典规划任务中良好启发式函数的问题。启发式函数针对状态空间和目标条件进行学习,样本数量限制为状态空间大小的一小部分,且必须能够对具有相同目标条件的所有状态进行良好泛化。我们的主要目标是更好地理解样本生成策略对由学习得到的启发式函数指导的贪婪最佳优先启发式搜索(GBFS)性能的影响。在一组受控实验中,我们发现两个主要因素决定了学习启发式的质量:样本集中包含哪些状态,以及到目标成本估计的质量。这两个因素是相互依赖的:如果样本在状态空间中分布不均,即使拥有完美的到目标成本估计也不足以保证效果。我们还研究了其他影响,例如添加具有高值估计的样本。基于我们的发现,我们提出了提升学习启发式质量的实用策略:三种旨在生成更具代表性状态的策略,以及两种改进到目标成本估计的策略。我们的实用策略使由学习启发式指导的GBFS算法的平均覆盖率几乎翻倍。