Real-to-Sim-to-Real technique is gaining increasing interest for robotic manipulation, as it can generate scalable data in simulation while having narrower sim-to-real gap. However, previous methods mainly focused on environment-level visual real-to-sim transfer, ignoring the transfer of interactions, which could be challenging and inefficient to obtain purely in simulation especially for contact-rich tasks. We propose ExoGS, a robot-free 4D Real-to-Sim-to-Real framework that captures both static environments and dynamic interactions in the real world and transfers them seamlessly to a simulated environment. It provides a new solution for scalable manipulation data collection and policy learning. ExoGS employs a self-designed robot-isomorphic passive exoskeleton AirExo-3 to capture kinematically consistent trajectories with millimeter-level accuracy and synchronized RGB observations during direct human demonstrations. The robot, objects, and environment are reconstructed as editable 3D Gaussian Splatting assets, enabling geometry-consistent replay and large-scale data augmentation. Additionally, a lightweight Mask Adapter injects instance-level semantics into the policy to enhance robustness under visual domain shifts. Real-world experiments demonstrate that ExoGS significantly improves data efficiency and policy generalization compared to teleoperation-based baselines. Code and hardware files have been released on https://github.com/zaixiabalala/ExoGS.
翻译:实-仿-实技术在机器人操作领域正受到越来越多的关注,因为它能在仿真中生成可扩展的数据,同时具有更小的仿真到现实的差距。然而,先前的方法主要关注环境层面的视觉实-仿迁移,忽略了交互行为的迁移,而后者对于接触密集型的任务而言,纯粹在仿真中获取既具挑战性又效率低下。我们提出了ExoGS,一种无需机器人的四维实-仿-实框架,它能捕捉现实世界中的静态环境和动态交互,并将其无缝迁移到仿真环境中。这为可扩展的操作数据采集与策略学习提供了一种新的解决方案。ExoGS采用自行设计的机器人同构被动外骨骼AirExo-3,在直接的人类演示过程中,以毫米级精度捕捉运动学一致的轨迹以及同步的RGB观测。机器人、物体和环境被重建为可编辑的3D高斯溅射资产,从而实现几何一致的回放和大规模数据增强。此外,一个轻量级的掩码适配器将实例级语义注入到策略中,以增强其在视觉域偏移下的鲁棒性。真实世界实验表明,与基于遥操作的基线方法相比,ExoGS显著提高了数据效率和策略泛化能力。代码与硬件文件已发布于 https://github.com/zaixiabalala/ExoGS。