We study the problem of learning good heuristic functions for classical planning tasks with neural networks based on samples represented by states with their cost-to-goal estimates. The heuristic function is learned for a state space and goal condition with the number of samples limited to a fraction of the size of the state space, and must generalize well for all states of the state space with the same goal condition. Our main goal is to better understand the influence of sample generation strategies on the performance of a greedy best-first heuristic search (GBFS) guided by a learned heuristic function. In a set of controlled experiments, we find that two main factors determine the quality of the learned heuristic: the algorithm used to generate the sample set and how close the sample estimates to the perfect cost-to-goal are. These two factors are dependent: having perfect cost-to-goal estimates is insufficient if the samples are not well distributed across the state space. We also study other effects, such as adding samples with high-value estimates. Based on our findings, we propose practical strategies to improve the quality of learned heuristics: three strategies that aim to generate more representative states and two strategies that improve the cost-to-goal estimates. Our practical strategies result in a learned heuristic that, when guiding a GBFS algorithm, increases by more than 30% the mean coverage compared to a baseline learned heuristic.
翻译:我们研究了基于状态及其到目标代价估计的样本,利用神经网络学习经典规划任务中良好启发式函数的问题。该启发式函数针对特定状态空间和目标条件进行学习,样本数量限制在状态空间规模的一小部分,且必须对相同目标条件下状态空间的所有状态具有良好的泛化能力。我们的主要目标是更深入地理解样本生成策略对由学习启发式函数引导的贪婪最佳优先启发式搜索(GBFS)性能的影响。在一系列对照实验中,我们发现决定学习启发式质量的两个主要因素是:用于生成样本集的算法,以及样本估计值与完美到目标代价的接近程度。这两个因素相互依赖:如果样本在状态空间中分布不佳,即使拥有完美的到目标代价估计也是不够的。我们还研究了其他效应,例如添加具有高估计值的样本。基于这些发现,我们提出了改进学习启发式质量的实用策略:三种旨在生成更具代表性状态的策略,以及两种改进到目标代价估计的策略。我们的实用策略所产生的学习启发式,在引导GBFS算法时,其平均覆盖率相较于基线学习启发式提高了30%以上。