Previous studies on robotic manipulation are based on a limited understanding of the underlying 3D motion constraints and affordances. To address these challenges, we propose a comprehensive paradigm, termed UniAff, that integrates 3D object-centric manipulation and task understanding in a unified formulation. Specifically, we constructed a dataset labeled with manipulation-related key attributes, comprising 900 articulated objects from 19 categories and 600 tools from 12 categories. Furthermore, we leverage MLLMs to infer object-centric representations for manipulation tasks, including affordance recognition and reasoning about 3D motion constraints. Comprehensive experiments in both simulation and real-world settings indicate that UniAff significantly improves the generalization of robotic manipulation for tools and articulated objects. We hope that UniAff will serve as a general baseline for unified robotic manipulation tasks in the future. Images, videos, dataset, and code are published on the project website at:https://sites.google.com/view/uni-aff/home
翻译:先前关于机器人操作的研究对底层三维运动约束与可操作性的理解存在局限。为应对这些挑战,我们提出名为UniAff的综合范式,将三维物体中心化操作与任务理解统一于整合框架中。具体而言,我们构建了包含操作相关关键属性标注的数据集,涵盖19个类别的900个关节物体及12个类别的600种工具。进一步,我们利用多模态大语言模型推演面向操作任务的物体中心化表征,包括可操作性识别与三维运动约束推理。仿真与真实场景的综合实验表明,UniAff显著提升了机器人对工具与关节物体操作的泛化能力。我们期待UniAff能成为未来统一化机器人操作任务的通用基准。图像、视频、数据集及代码已发布于项目网站:https://sites.google.com/view/uni-aff/home