One-Shot Real-World Demonstration Synthesis for Scalable Bimanual Manipulation

Learning dexterous bimanual manipulation policies critically depends on large-scale, high-quality demonstrations, yet current paradigms face inherent trade-offs: teleoperation provides physically grounded data but is prohibitively labor-intensive, while simulation-based synthesis scales efficiently but suffers from sim-to-real gaps. We present BiDemoSyn, a framework that synthesizes contact-rich, physically feasible bimanual demonstrations from a single real-world example. The key idea is to decompose tasks into invariant coordination blocks and variable, object-dependent adjustments, then adapt them through vision-guided alignment and lightweight trajectory optimization. This enables the generation of thousands of diverse and feasible demonstrations within several hour, without repeated teleoperation or reliance on imperfect simulation. Across six dual-arm tasks, we show that policies trained on BiDemoSyn data generalize robustly to novel object poses and shapes, significantly outperforming recent strong baselines. Beyond the one-shot setting, BiDemoSyn naturally extends to few-shot-based synthesis, improving object-level diversity and out-of-distribution generalization while maintaining strong data efficiency. Moreover, policies trained on BiDemoSyn data exhibit zero-shot cross-embodiment transfer to new robotic platforms, enabled by object-centric observations and a simplified 6-DoF end-effector action representation that decouples policies from embodiment-specific dynamics. By bridging the gap between efficiency and real-world fidelity, BiDemoSyn provides a scalable path toward practical imitation learning for complex bimanual manipulation without compromising physical grounding.

翻译：学习灵巧的双手操作策略关键在于获取大规模高质量演示数据，但现有范式面临固有权衡：遥操作能提供物理真实的演示数据，但其劳动强度极大；而基于仿真的合成方法虽可高效扩展，却存在仿真到现实的迁移鸿沟。本文提出BiDemoSyn框架，该框架能够从单次真实世界示例中合成具有丰富接触、物理可行的双手操作演示。其核心思想是将任务分解为不变的协调模块和可变的、对象相关的调整部分，随后通过视觉引导对齐与轻量级轨迹优化进行适配。这使得在数小时内生成数千个多样且可行的演示成为可能，无需重复遥操作或依赖不完美的仿真。在六项双臂任务中，我们证明基于BiDemoSyn数据训练的策略能够稳健地泛化至新物体位姿与形状，显著优于近期强基线方法。除单次演示设置外，BiDemoSyn可自然扩展至基于少量示例的合成，在保持高数据效率的同时提升物体层面的多样性及分布外泛化能力。此外，基于BiDemoSyn数据训练的策略展现出零样本跨具身迁移能力，可迁移至新机器人平台，这得益于以物体为中心的观测方式以及简化的6自由度末端执行器动作表示——该表示将策略与特定具身的动力学特性解耦。通过弥合效率与真实世界保真度之间的差距，BiDemoSyn为复杂双手操作的实际模仿学习提供了一条可扩展路径，且不牺牲物理真实性。