In task and motion planning, high-level task planning is done over an abstraction of the world to enable efficient search in long-horizon robotics problems. However, the feasibility of these task-level plans relies on the downward refinability of the abstraction into continuous motion. When a domain's refinability is poor, task-level plans that appear valid may ultimately fail during motion planning, requiring replanning and resulting in slower overall performance. Prior works mitigate this by encoding refinement issues as constraints to prune infeasible task plans. However, these approaches only add constraints upon refinement failure, expending significant search effort on infeasible branches. We propose VIZ-COAST, a method of leveraging the common-sense spatial reasoning of large pretrained Vision-Language Models to identify issues with downward refinement a priori, bypassing the need to fix these failures during planning. Experiments on two challenging TAMP domains show that our approach is able to extract plausible constraints from images and domain descriptions, drastically reducing planning times and, in some cases, eliminating downward refinement failures altogether, generalizing to a diverse range of instances from the broader domain.
翻译:在任务与运动规划中,高层任务规划基于对世界的抽象表示进行,以实现长时域机器人问题的高效搜索。然而,这些任务层规划的可执行性依赖于抽象表示向连续运动的下层可细化性。当领域可细化性较差时,看似有效的任务层规划可能在运动规划阶段最终失败,导致重新规划并降低整体性能。先前研究通过将细化问题编码为约束条件来剪除不可行任务规划,从而缓解此问题。但这些方法仅在细化失败后添加约束,在不可行分支上消耗大量搜索资源。我们提出VIZ-COAST方法,利用大规模预训练视觉语言模型的常识空间推理能力,先验识别下层细化问题,避免在规划过程中修复这些失败。在两个具有挑战性的TAMP领域上的实验表明,该方法能够从图像和领域描述中提取合理的约束条件,显著缩短规划时间,并在某些情况下完全消除下层细化失败,可泛化至更广泛领域中的多样化实例。