SPIN: Simultaneous Perception, Interaction and Navigation

While there has been remarkable progress recently in the fields of manipulation and locomotion, mobile manipulation remains a long-standing challenge. Compared to locomotion or static manipulation, a mobile system must make a diverse range of long-horizon tasks feasible in unstructured and dynamic environments. While the applications are broad and interesting, there are a plethora of challenges in developing these systems such as coordination between the base and arm, reliance on onboard perception for perceiving and interacting with the environment, and most importantly, simultaneously integrating all these parts together. Prior works approach the problem using disentangled modular skills for mobility and manipulation that are trivially tied together. This causes several limitations such as compounding errors, delays in decision-making, and no whole-body coordination. In this work, we present a reactive mobile manipulation framework that uses an active visual system to consciously perceive and react to its environment. Similar to how humans leverage whole-body and hand-eye coordination, we develop a mobile manipulator that exploits its ability to move and see, more specifically -- to move in order to see and to see in order to move. This allows it to not only move around and interact with its environment but also, choose "when" to perceive "what" using an active visual system. We observe that such an agent learns to navigate around complex cluttered scenarios while displaying agile whole-body coordination using only ego-vision without needing to create environment maps. Results visualizations and videos at https://spin-robot.github.io/

翻译：尽管近年来在操作与运动领域取得了显著进展，但移动操作仍是一个长期挑战。与单纯的运动或静态操作不同，移动系统必须在非结构化动态环境中完成多种多样的长时域任务。虽然其应用范围广阔且富有前景，但在开发这类系统时面临诸多挑战，例如基座与机械臂的协调、依赖机载感知系统实现环境感知与交互，以及最关键的是将所有组件同步整合。先前的研究通常采用解耦的分模块化技能来处理移动与操作问题，但这些模块仅被简单串联，导致误差累积、决策延迟以及缺乏全身协调等局限。在本工作中，我们提出一种具有反应能力的移动操作框架，它利用主动视觉系统有意识地感知并响应环境。借鉴人类运用全身协调与手眼协同的方式，我们开发了一款移动操作器，充分发挥其移动与视觉能力——具体而言，通过移动实现感知，通过感知引导移动。这使其不仅能自由移动并与环境交互，还能借助主动视觉系统自主选择"何时"感知"何物"。实验表明，这类智能体仅依靠本体视觉即可在复杂杂乱场景中导航，同时展现敏捷的全身协调能力，且无需构建环境地图。结果可视化及演示视频见 https://spin-robot.github.io/