TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation

A critical bottleneck limiting imitation learning in robotics is the lack of data. This problem is more severe in mobile manipulation, where collecting demonstrations is harder than in stationary manipulation due to the lack of available and easy-to-use teleoperation interfaces. In this work, we demonstrate TeleMoMa, a general and modular interface for whole-body teleoperation of mobile manipulators. TeleMoMa unifies multiple human interfaces including RGB and depth cameras, virtual reality controllers, keyboard, joysticks, etc., and any combination thereof. In its more accessible version, TeleMoMa works using simply vision (e.g., an RGB-D camera), lowering the entry bar for humans to provide mobile manipulation demonstrations. We demonstrate the versatility of TeleMoMa by teleoperating several existing mobile manipulators - PAL Tiago++, Toyota HSR, and Fetch - in simulation and the real world. We demonstrate the quality of the demonstrations collected with TeleMoMa by training imitation learning policies for mobile manipulation tasks involving synchronized whole-body motion. Finally, we also show that TeleMoMa's teleoperation channel enables teleoperation on site, looking at the robot, or remote, sending commands and observations through a computer network, and perform user studies to evaluate how easy it is for novice users to learn to collect demonstrations with different combinations of human interfaces enabled by our system. We hope TeleMoMa becomes a helpful tool for the community enabling researchers to collect whole-body mobile manipulation demonstrations. For more information and video results, https://robin-lab.cs.utexas.edu/telemoma-web.

翻译：限制机器人模仿学习发展的关键瓶颈在于数据匮乏。这一问题在移动操作领域尤为严峻，由于缺乏现成易用的遥操作接口，其演示数据采集难度远超固定基座操作场景。本研究提出TeleMoMa——一套用于移动操作机器人全身遥操作的通用模块化接口。该接口统一整合了RGB与深度相机、虚拟现实控制器、键盘、操纵杆等多种人机交互设备，并支持任意组合使用。在最具易用性的配置下，TeleMoMa仅需视觉设备（如RGB-D相机）即可运行，大幅降低了人类提供移动操作演示的技术门槛。我们通过在仿真环境与真实世界中遥操作PAL Tiago++、Toyota HSR及Fetch等多款现有移动操作机器人，验证了系统的泛化能力。基于TeleMoMa采集的演示轨迹，我们成功训练出面向需要全身协同运动任务的模仿学习策略。此外，我们证实TeleMoMa的遥操作通道既支持本地上机遥操作（直接观察机器人），也支持远程遥操作（通过计算机网络传输指令与观测数据）。通过用户实验，我们评估了新手用户利用本系统支持的不同人机接口组合学习采集演示数据的难易程度。我们期待TeleMoMa能成为社区研究者的得力工具，助力采集全身移动操作演示数据。更多信息与视频成果，请访问https://robin-lab.cs.utexas.edu/telemoma-web。