Motion planning involves determining a sequence of robot configurations to reach a desired pose, subject to movement and safety constraints. Traditional motion planning finds collision-free paths, but this is overly restrictive in clutter, where it may not be possible for a robot to accomplish a task without contact. In addition, contacts range from relatively benign (e.g., brushing a soft pillow) to more dangerous (e.g., toppling a glass vase). Due to this diversity, it is difficult to characterize which contacts may be acceptable or unacceptable. In this paper, we propose IMPACT, a novel motion planning framework that uses Vision-Language Models (VLMs) to infer environment semantics, identifying which parts of the environment can best tolerate contact based on object properties and locations. Our approach uses the VLM's outputs to produce a dense 3D "cost map" that encodes contact tolerances and seamlessly integrates with standard motion planners. We perform experiments using 20 simulation and 10 real-world scenes and assess using task success rate, object displacements, and feedback from human evaluators. Our results over 3620 simulation and 200 real-world trials suggest that IMPACT enables efficient contact-rich motion planning in cluttered settings while outperforming alternative methods and ablations. Supplementary material is available at https://impact-planning.github.io/.
翻译:运动规划旨在确定机器人达到期望位姿的构型序列,同时满足运动与安全约束。传统运动规划寻求无碰撞路径,但在杂乱环境中这种要求可能过于严格,因为机器人有时必须通过接触才能完成任务。此外,接触行为的影响范围广泛,从相对无害(如轻触软枕)到较为危险(如碰倒玻璃花瓶)皆有可能。由于这种多样性,很难界定哪些接触是可接受或不可接受的。本文提出IMPACT这一新型运动规划框架,利用视觉语言模型(VLMs)推断环境语义,根据物体属性与位置识别环境中最能耐受接触的区域。该方法通过VLM输出生成稠密的三维“代价地图”,对接触容忍度进行编码,并可无缝集成至标准运动规划器中。我们在20个仿真场景与10个真实场景中进行实验,通过任务成功率、物体位移量及人类评估者反馈进行性能评估。基于3620次仿真试验与200次真实试验的结果表明,IMPACT能够在杂乱环境中实现高效的密集接触运动规划,其性能优于现有方法及消融实验方案。补充材料详见 https://impact-planning.github.io/。