Robust generalization in robotic manipulation is crucial for robots to adapt flexibly to diverse environments. Existing methods usually improve generalization by scaling data and networks, but model tasks independently and overlook skill-level information. Observing that tasks within the same skill share similar motion patterns, we propose Skill-Aware Diffusion (SADiff), which explicitly incorporates skill-level information to improve generalization. SADiff learns skill-specific representations through a skill-aware encoding module with learnable skill tokens, and conditions a skill-constrained diffusion model to generate object-centric motion flow. A skill-retrieval transformation strategy further exploits skill-specific trajectory priors to refine the mapping from 2D motion flow to executable 3D actions. Furthermore, we introduce IsaacSkill, a high-fidelity dataset containing fundamental robotic skills for comprehensive evaluation and sim-to-real transfer. Experiments in simulation and real-world settings show that SADiff achieves good performance and generalization across various manipulation tasks. Code, data, and videos are available at https://sites.google.com/view/sa-diff.
翻译:机器人操作的鲁棒泛化能力对于机器人灵活适应多样化环境至关重要。现有方法通常通过扩大数据规模和网络规模来提升泛化能力,但独立处理各项任务且忽视了技能层面的信息。通过观察发现,同一技能类别下的任务具有相似的运动模式,本文提出技能感知扩散模型(SADiff),显式地融入技能层级信息以提升泛化性能。SADiff通过具有可学习技能标记的技能感知编码模块学习技能特异性表征,并约束条件扩散模型生成以物体为中心的运动流。技能检索转换策略进一步利用技能特定的轨迹先验,优化从二维运动流到可执行三维动作的映射。此外,我们构建了IsaacSkill高保真数据集,包含用于全面评估与仿真到现实迁移的基础机器人技能。仿真与真实场景实验表明,SADiff在多种操作任务中均展现出优异的性能与泛化能力。代码、数据及演示视频详见https://sites.google.com/view/sa-diff。