We present a low-cost data generation pipeline that integrates physics-based simulation, human demonstrations, and model-based planning to efficiently generate large-scale, high-quality datasets for contact-rich robotic manipulation tasks. Starting with a small number of embodiment-flexible human demonstrations collected in a virtual reality simulation environment, the pipeline refines these demonstrations using optimization-based kinematic retargeting and trajectory optimization to adapt them across various robot embodiments and physical parameters. This process yields a diverse, physically consistent dataset that enables cross-embodiment data transfer, and offers the potential to reuse legacy datasets collected under different hardware configurations or physical parameters. We validate the pipeline's effectiveness by training diffusion policies from the generated datasets for challenging contact-rich manipulation tasks across multiple robot embodiments, including a floating Allegro hand and bimanual robot arms. The trained policies are deployed zero-shot on hardware for bimanual iiwa arms, achieving high success rates with minimal human input. Project website: https://lujieyang.github.io/physicsgen/.
翻译:本文提出一种低成本数据生成流程,该流程整合了基于物理的仿真、人类示范与基于模型的规划,能够高效生成适用于接触密集型机器人操作任务的大规模高质量数据集。该流程以虚拟现实仿真环境中收集的少量具身灵活性人类示范为起点,通过基于优化的运动学重定向与轨迹优化技术对这些示范进行精细化处理,使其能够适配不同机器人具身形态与物理参数。这一过程产生了多样化且物理一致的数据集,实现了跨具身数据迁移,并为复用不同硬件配置或物理参数下收集的历史数据集提供了可能。我们通过使用生成数据集训练扩散策略,在包含浮动Allegro手与双手机器人臂在内的多种机器人具身形态上,针对具有挑战性的接触密集型操作任务验证了该流程的有效性。训练完成的策略以零样本方式部署于双iiwa机械臂硬件平台,在极少人工干预下实现了较高的任务成功率。项目网站:https://lujieyang.github.io/physicsgen/。