VIGOR：面向仿人机器人统一跌倒安全的视觉上下文目标推断 (VIGOR: Visual Goal-In-Context Inference for Unified Humanoid Fall Safety)

Reliable fall recovery is critical for humanoids operating in cluttered environments. Unlike quadrupeds or wheeled robots, humanoids experience high-energy impacts, complex whole-body contact, and large viewpoint changes during a fall, making recovery essential for continued operation. Existing methods fragment fall safety into separate problems such as fall avoidance, impact mitigation, and stand-up recovery, or rely on end-to-end policies trained without vision through reinforcement learning or imitation learning, often on flat terrain. At a deeper level, fall safety is treated as monolithic data complexity, coupling pose, dynamics, and terrain and requiring exhaustive coverage, limiting scalability and generalization. We present a unified fall safety approach that spans all phases of fall recovery. It builds on two insights: 1) Natural human fall and recovery poses are highly constrained and transferable from flat to complex terrain through alignment, and 2) Fast whole-body reactions require integrated perceptual-motor representations. We train a privileged teacher using sparse human demonstrations on flat terrain and simulated complex terrains, and distill it into a deployable student that relies only on egocentric depth and proprioception. The student learns how to react by matching the teacher's goal-in-context latent representation, which combines the next target pose with the local terrain, rather than separately encoding what it must perceive and how it must act. Results in simulation and on a real Unitree G1 humanoid demonstrate robust, zero-shot fall safety across diverse non-flat environments without real-world fine-tuning. The project page is available at https://vigor2026.github.io/

翻译：可靠的跌倒恢复对于在杂乱环境中运行的仿人机器人至关重要。与四足机器人或轮式机器人不同，仿人机器人在跌倒过程中会经历高能量冲击、复杂的全身接触以及大幅度的视角变化，这使得恢复能力对其持续运行至关重要。现有方法将跌倒安全分割为诸如跌倒规避、冲击缓解和站起恢复等独立问题，或者依赖于通过强化学习或模仿学习在平坦地形上训练的无视觉端到端策略。在更深层次上，跌倒安全被视为单一的数据复杂性，耦合了姿态、动力学和地形，需要详尽的覆盖，从而限制了可扩展性和泛化能力。我们提出了一种统一的跌倒安全方法，涵盖跌倒恢复的所有阶段。该方法基于两个洞见：1）自然的人类跌倒和恢复姿态受到高度约束，并且可以通过对齐从平坦地形迁移到复杂地形；2）快速的全身反应需要整合的感知-运动表征。我们使用在平坦地形和模拟复杂地形上的稀疏人类演示数据训练一个特权教师模型，并将其蒸馏为一个仅依赖自我中心深度和本体感觉的可部署学生模型。该学生模型通过匹配教师的上下文目标潜在表征来学习如何反应，该表征将下一个目标姿态与局部地形相结合，而不是分别编码其必须感知的内容和必须采取的行动。在仿真和真实Unitree G1仿人机器人上的结果表明，该方法无需真实世界微调，即可在多种非平坦环境中实现鲁棒的零样本跌倒安全。项目页面位于 https://vigor2026.github.io/