Representing robotic manipulation tasks as constraints that associate the robot and the environment is a promising way to encode desired robot behaviors. However, it remains unclear how to formulate the constraints such that they are 1) versatile to diverse tasks, 2) free of manual labeling, and 3) optimizable by off-the-shelf solvers to produce robot actions in real-time. In this work, we introduce Relational Keypoint Constraints (ReKep), a visually-grounded representation for constraints in robotic manipulation. Specifically, ReKep is expressed as Python functions mapping a set of 3D keypoints in the environment to a numerical cost. We demonstrate that by representing a manipulation task as a sequence of Relational Keypoint Constraints, we can employ a hierarchical optimization procedure to solve for robot actions (represented by a sequence of end-effector poses in SE(3)) with a perception-action loop at a real-time frequency. Furthermore, in order to circumvent the need for manual specification of ReKep for each new task, we devise an automated procedure that leverages large vision models and vision-language models to produce ReKep from free-form language instructions and RGB-D observations. We present system implementations on a wheeled single-arm platform and a stationary dual-arm platform that can perform a large variety of manipulation tasks, featuring multi-stage, in-the-wild, bimanual, and reactive behaviors, all without task-specific data or environment models. Website at https://rekep-robot.github.io/.
翻译:将机器人操作任务表示为关联机器人与环境的约束,是编码期望机器人行为的一种有前景的方法。然而,如何构建这些约束仍不明确,需要满足:1)能适应多样化任务,2)无需人工标注,3)可由现成求解器实时优化以生成机器人动作。本文提出关联关键点约束,这是一种用于机器人操作约束的视觉接地表示。具体而言,ReKep被表达为将环境中一组3D关键点映射为数值代价的Python函数。我们证明,通过将操作任务表示为一序列关联关键点约束,我们可以采用分层优化程序,在实时频率下通过感知-动作循环求解机器人动作。此外,为避免为每个新任务手动指定ReKep,我们设计了一种自动化流程,利用大型视觉模型和视觉-语言模型,从自由形式的语言指令和RGB-D观测中生成ReKep。我们在一个轮式单臂平台和一个固定双臂平台上实现了该系统,能够执行大量多样化操作任务,包括多阶段、非结构化环境、双手操作和反应式行为,且均无需任务特定数据或环境模型。项目网站:https://rekep-robot.github.io/。