TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation

A critical bottleneck limiting imitation learning in robotics is the lack of data. This problem is more severe in mobile manipulation, where collecting demonstrations is harder than in stationary manipulation due to the lack of available and easy-to-use teleoperation interfaces. In this work, we demonstrate TeleMoMa, a general and modular interface for whole-body teleoperation of mobile manipulators. TeleMoMa unifies multiple human interfaces including RGB and depth cameras, virtual reality controllers, keyboard, joysticks, etc., and any combination thereof. In its more accessible version, TeleMoMa works using simply vision (e.g., an RGB-D camera), lowering the entry bar for humans to provide mobile manipulation demonstrations. We demonstrate the versatility of TeleMoMa by teleoperating several existing mobile manipulators - PAL Tiago++, Toyota HSR, and Fetch - in simulation and the real world. We demonstrate the quality of the demonstrations collected with TeleMoMa by training imitation learning policies for mobile manipulation tasks involving synchronized whole-body motion. Finally, we also show that TeleMoMa's teleoperation channel enables teleoperation on site, looking at the robot, or remote, sending commands and observations through a computer network, and perform user studies to evaluate how easy it is for novice users to learn to collect demonstrations with different combinations of human interfaces enabled by our system. We hope TeleMoMa becomes a helpful tool for the community enabling researchers to collect whole-body mobile manipulation demonstrations. For more information and video results, https://robin-lab.cs.utexas.edu/telemoma-web.

翻译：限制机器人领域模仿学习发展的一个关键瓶颈是数据匮乏。这一问题在移动操作中尤为突出，因为与固定基座操作相比，缺乏易用的遥操作接口导致示教数据采集更具挑战性。本文提出TeleMoMa——一种用于移动操作机器人全身遥操作的通用模块化接口。该接口可统一整合多种人机交互界面（包括RGB与深度相机、虚拟现实控制器、键盘、游戏手柄等），并支持这些界面的任意组合。在其最易获取的版本中，TeleMoMa仅需视觉传感器（如RGB-D相机）即可运行，从而降低了人类为移动操作提供示教数据的技术门槛。我们通过在仿真环境与真实世界中遥操作多种现有移动操作机器人（PAL Tiago++、Toyota HSR及Fetch）来验证TeleMoMa的泛用性。通过训练涉及全身协同运动的移动操作任务的模仿学习策略，我们证实了利用TeleMoMa采集的示教数据具有卓越质量。最后，研究表明TeleMoMa的遥操作通道既支持现场（直接注视机器人）遥操作，也支持远程（通过计算机网络传输指令与观测数据）遥操作。我们通过用户研究评估了新手用户利用本系统提供的不同人机界面组合采集示教数据的易用性。我们期望TeleMoMa能成为学界的有力工具，助力研究人员采集全身移动操作示教数据。更多信息与视频结果参见：https://robin-lab.cs.utexas.edu/telemoma-web