Assembly hinges on reliably forming connections between parts; yet most robotic approaches plan assembly sequences and part poses while treating connectors as an afterthought. Connections represent the foundational physical constraints of assembly execution; while task planning sequences operations, the precise establishment of these constraints ultimately determines assembly success. In this paper, we treat connections as explicit, primary entities in assembly representation, directly encoding connector types, specifications, and locations for every assembly step. Drawing inspiration from how humans learn assembly tasks through step-by-step instruction manuals, we present Manual2Skill++, a vision-language framework that automatically extracts structured connection information from assembly manuals. We encode assembly tasks as hierarchical graphs where nodes represent parts and sub-assemblies, and edges explicitly model connection relationships between components. A large-scale vision-language model parses symbolic diagrams and annotations in manuals to instantiate these graphs, leveraging the rich connection knowledge embedded in human-designed instructions. We curate a dataset containing over 20 assembly tasks with diverse connector types to validate our representation extraction approach, and evaluate the complete task understanding-to-execution pipeline across four complex assembly scenarios in simulation, spanning furniture, toys, and manufacturing components with real-world correspondence. More detailed information can be found at https://nus-lins-lab.github.io/Manual2SkillPP/
翻译:装配的核心在于可靠地建立零件间的连接关系;但现有机器人方法大多在规划装配序列与零件位姿时,将连接件视为次要因素。连接关系构成了装配执行的基础物理约束:尽管任务规划控制操作序列,但能否精准建立这些约束最终决定了装配成败。本文将以连接关系作为装配表征中的显式首要实体,直接编码每个装配步骤中的连接件类型、规格与位置。受人类通过逐步说明手册学习装配任务的启发,我们提出Manual2Skill++框架——一种能从装配手册自动提取结构化连接信息的视觉-语言框架。我们将装配任务编码为层次化图结构:节点代表零件与子装配体,边显式建模组件间的连接关系。通过大规模视觉-语言模型解析手册中的符号化图示与注释,可实例化上述图结构,充分挖掘人类设计指令中蕴含的丰富连接知识。我们构建了涵盖20余种装配任务、包含多种连接件类型的数据集以验证表征提取方法,并在仿真环境中针对家具、玩具及制造业组件等四类复杂装配场景(与真实场景对应)评估了从任务理解到执行的全流程。更多详情请访问https://nus-lins-lab.github.io/Manual2SkillPP/