In recent years, there has been growing interest in utilizing modern machine learning techniques to learn heuristic functions for forward search algorithms. Despite this, there has been little theoretical understanding of what they should learn, how to train them, and why we do so. This lack of understanding has resulted in the adoption of diverse training targets (suboptimal vs optimal costs vs admissible heuristics) and loss functions (e.g., square vs absolute errors) in the literature. In this work, we focus on how to effectively utilize the information provided by admissible heuristics in heuristic learning. We argue that learning from poly-time admissible heuristics by minimizing mean square errors (MSE) is not the correct approach, since its result is merely a noisy, inadmissible copy of an efficiently computable heuristic. Instead, we propose to model the learned heuristic as a truncated gaussian, where admissible heuristics are used not as training targets but as lower bounds of this distribution. This results in a different loss function from the MSE commonly employed in the literature, which implicitly models the learned heuristic as a gaussian distribution. We conduct experiments where both MSE and our novel loss function are applied to learning a heuristic from optimal plan costs. Results show that our proposed method converges faster during training and yields better heuristics.
翻译:近年来,利用现代机器学习技术学习前向搜索算法的启发式函数引起了广泛兴趣。然而,关于应学习什么、如何训练以及为何如此操作,目前仍缺乏理论层面的理解。这种认知缺失导致文献中采用了多样化的训练目标(次优成本、最优成本、可容许启发式)与损失函数(如平方误差、绝对误差)。本研究聚焦于如何有效利用可容许启发式提供的信息进行启发式学习。我们主张,通过最小化均方误差(MSE)从多项式时间可容许启发式中学习并非正确方法,因为其结果仅为高效可计算启发式的带噪不可容许副本。相反,我们提出将学习到的启发式建模为截断高斯分布,其中可容许启发式不作为训练目标,而作为该分布的下界。这产生了一种与文献中常用的均方误差不同的损失函数——后者隐含地将学习到的启发式建模为高斯分布。我们通过实验对比了均方误差与新损失函数在基于最优计划成本学习启发式中的表现。结果表明,本文提出的方法在训练过程中收敛更快,且能获得更优的启发式函数。