Vision-based locomotion has shown great promise in enabling legged robots to perceive and adapt to complex environments. However, visual information is inherently fragile, being vulnerable to occlusions, reflections, and lighting changes, which often cause instability in locomotion. Inspired by animal sensorimotor integration, we propose KiVi, a Kinesthetic-Visuospatial integration framework, where kinesthetics encodes proprioceptive sensing of body motion and visuospatial reasoning captures visual perception of surrounding terrain. Specifically, KiVi separates these pathways, leveraging proprioception as a stable backbone while selectively incorporating vision for terrain awareness and obstacle avoidance. This modality-balanced, yet integrative design, combined with memory-enhanced attention, allows the robot to robustly interpret visual cues while maintaining fallback stability through proprioception. Extensive experiments show that our method enables quadruped robots to stably traverse diverse terrains and operate reliably in unstructured outdoor environments, remaining robust to out-of-distribution (OOD) visual noise and occlusion unseen during training, thereby highlighting its effectiveness and applicability to real-world legged locomotion.
翻译:基于视觉的腿式运动在使机器人感知并适应复杂环境方面展现出巨大潜力。然而,视觉信息本身具有脆弱性,易受遮挡、反射和光照变化的影响,常导致运动不稳定。受动物感觉运动整合的启发,我们提出KiVi——一种动觉-视觉空间集成框架,其中动觉编码身体运动的本体感知,而视觉空间推理则捕捉周围地形的视觉感知。具体而言,KiVi分离了这两种感知通路,利用本体感知作为稳定主干,同时选择性融入视觉以实现地形感知与避障。这种模态平衡且集成的设计,结合记忆增强注意力机制,使机器人能够鲁棒地解析视觉线索,同时通过本体感知保持后备稳定性。大量实验表明,我们的方法使四足机器人能够稳定穿越多样地形,并在非结构化户外环境中可靠运行,对训练中未见过的分布外视觉噪声与遮挡保持鲁棒性,从而凸显了其在现实世界腿式运动中的有效性与适用性。