While learning a heuristic function for forward search algorithms with modern machine learning techniques has been gaining interest in recent years, there has been little theoretical understanding of \emph{what} they should learn, \emph{how} to train them, and \emph{why} we do so. This lack of understanding leads to various literature performing an ad-hoc selection of datasets (suboptimal vs optimal costs or admissible vs inadmissible heuristics) and optimization metrics (e.g., squared vs absolute errors). Moreover, due to the lack of admissibility of the resulting trained heuristics, little focus has been put on the role of admissibility \emph{during} learning. This paper articulates the role of admissible heuristics in supervised heuristic learning using them as parameters of Truncated Gaussian distributions, which tightens the hypothesis space compared to ordinary Gaussian distributions. We argue that this mathematical model faithfully follows the principle of maximum entropy and empirically show that, as a result, it yields more accurate heuristics and converges faster during training.
翻译:尽管近年来利用现代机器学习技术学习前向搜索算法的启发式函数引起了广泛关注,但关于“学什么”、“如何训练”及“为何如此”的理论理解仍十分匮乏。这种理解上的缺失导致大量文献在数据集选取(次优解与最优解成本、可容许与不可容许启发式)和优化指标(如平方误差与绝对误差)上采用临时性方案。此外,由于训练所得启发式函数缺乏可容许性,现有研究鲜少关注可容许性在学习过程中的作用。本文阐明可容许启发式在有监督启发式学习中的作用,将其作为截断高斯分布的参数,从而相比普通高斯分布紧缩了假设空间。我们论证该数学模型严格遵循最大熵原理,并通过实证表明:该模型能产生更精确的启发式函数,并在训练过程中实现更快的收敛速度。