With the advent of machine learning, there have been several recent attempts to learn effective and generalizable heuristics. Local Heuristic A* (LoHA*) is one recent method that instead of learning the entire heuristic estimate, learns a "local" residual heuristic that estimates the cost to escape a region (Veerapaneni et al 2023). LoHA*, like other supervised learning methods, collects a dataset of target values by querying an oracle on many planning problems (in this case, local planning problems). This data collection process can become slow as the size of the local region increases or if the domain requires expensive collision checks. Our main insight is that when an A* search solves a start-goal planning problem it inherently ends up solving multiple local planning problems. We exploit this observation to propose an efficient data collection framework that does <1/10th the amount of work (measured by expansions) to collect the same amount of data in comparison to baselines. This idea also enables us to run LoHA* in an online manner where we can iteratively collect data and improve our model while solving relevant start-goal tasks. We demonstrate the performance of our data collection and online framework on a 4D $(x, y, \theta, v)$ navigation domain.
翻译:随着机器学习技术的发展,近年来涌现出多种旨在学习高效且可泛化启发函数的方法。局部启发式A*(LoHA*)是其中一种新方法,它并非学习整个启发式估计,而是学习预测逃离某个局部区域代价的"局部"残差启发函数(Veerapaneni等,2023)。与其他监督学习方法类似,LoHA*通过向求解器查询大量规划问题(此处为局部规划问题)的求解结果来构建目标值数据集。当局部区域规模增大或领域需要昂贵的碰撞检测时,该数据收集过程会变得缓慢。我们的核心发现是:当A*搜索求解起点-目标规划问题时,其本质上会同时求解多个局部规划问题。基于这一观察,我们提出了一种高效的数据收集框架,在收集等量数据时,其工作量(以扩展节点数衡量)仅为基线方法的1/10以下。该思想还使我们能够以在线方式运行LoHA*,即在求解相关起点-目标任务过程中迭代收集数据并改进模型。我们通过在4维 $(x, y, \theta, v)$ 导航域上的实验验证了所提数据收集与在线框架的性能。