With the advent of machine learning, there have been several recent attempts to learn effective and generalizable heuristics. Local Heuristic A* (LoHA*) is one recent method that instead of learning the entire heuristic estimate, learns a "local" residual heuristic that estimates the cost to escape a region (Veerapaneni et al 2023). LoHA*, like other supervised learning methods, collects a dataset of target values by querying an oracle on many planning problems (in this case, local planning problems). This data collection process can become slow as the size of the local region increases or if the domain requires expensive collision checks. Our main insight is that when an A* search solves a start-goal planning problem it inherently ends up solving multiple local planning problems. We exploit this observation to propose an efficient data collection framework that does <1/10th the amount of work (measured by expansions) to collect the same amount of data in comparison to baselines. This idea also enables us to run LoHA* in an online manner where we can iteratively collect data and improve our model while solving relevant start-goal tasks. We demonstrate the performance of our data collection and online framework on a 4D $(x, y, \theta, v)$ navigation domain.
翻译:随着机器学习的发展,近期出现了若干尝试学习有效且可泛化启发式函数的方法。局部启发式A*(LoHA*)是其中一种新方法,它并非学习整个启发式估计量,而是学习一个估计逃离局部区域代价的“局部”残差启发式函数(Veerapaneni等人,2023)。与其他监督学习方法类似,LoHA*通过在多个规划问题(此处指局部规划问题)上查询先知模型来收集目标值数据集。随着局部区域规模增大或当领域需要昂贵的碰撞检测时,这一数据收集过程会变得缓慢。我们的核心发现是:当A*搜索求解一个起点-终点规划问题时,其本质上会同时求解多个局部规划问题。基于此观察,我们提出一种高效数据收集框架,与基线方法相比,该框架仅需不到十分之一的计算量(以扩展次数衡量)即可收集等量数据。这一思想还使我们能够以在线方式运行LoHA*,即在求解相关起点-终点任务时迭代收集数据并改进模型。我们在一个四维$(x, y, \theta, v)$导航域上验证了所提数据收集框架与在线框架的性能。