IMPACT：基于视觉语言模型的可接受接触轨迹智能运动规划 (IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models)

Motion planning involves determining a sequence of robot configurations to reach a desired pose, subject to movement and safety constraints. Traditional motion planning finds collision-free paths, but this is overly restrictive in clutter, where it may not be possible for a robot to accomplish a task without contact. In addition, contacts range from relatively benign (e.g., brushing a soft pillow) to more dangerous (e.g., toppling a glass vase). Due to this diversity, it is difficult to characterize which contacts may be acceptable or unacceptable. In this paper, we propose IMPACT, a novel motion planning framework that uses Vision-Language Models (VLMs) to infer environment semantics, identifying which parts of the environment can best tolerate contact based on object properties and locations. Our approach uses the VLM's outputs to produce a dense 3D "cost map" that encodes contact tolerances and seamlessly integrates with standard motion planners. We perform experiments using 20 simulation and 10 real-world scenes and assess using task success rate, object displacements, and feedback from human evaluators. Our results over 3620 simulation and 200 real-world trials suggest that IMPACT enables efficient contact-rich motion planning in cluttered settings while outperforming alternative methods and ablations. Supplementary material is available at https://impact-planning.github.io/.

翻译：运动规划旨在确定机器人达到期望位姿的构型序列，同时满足运动与安全约束。传统运动规划寻求无碰撞路径，但在杂乱环境中这种要求可能过于严格，因为机器人有时必须通过接触才能完成任务。此外，接触行为的影响范围广泛，从相对无害（如轻触软枕）到较为危险（如碰倒玻璃花瓶）皆有可能。由于这种多样性，很难界定哪些接触是可接受或不可接受的。本文提出IMPACT这一新型运动规划框架，利用视觉语言模型（VLMs）推断环境语义，根据物体属性与位置识别环境中最能耐受接触的区域。该方法通过VLM输出生成稠密的三维“代价地图”，对接触容忍度进行编码，并可无缝集成至标准运动规划器中。我们在20个仿真场景与10个真实场景中进行实验，通过任务成功率、物体位移量及人类评估者反馈进行性能评估。基于3620次仿真试验与200次真实试验的结果表明，IMPACT能够在杂乱环境中实现高效的密集接触运动规划，其性能优于现有方法及消融实验方案。补充材料详见 https://impact-planning.github.io/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日