Humanoid behavior foundation models aim to acquire reusable whole-body control policies from broad human motion priors, enabling a single controller to produce diverse and expressive behaviors. However, existing motion-centric foundation policies largely assume that the reference motion is already physically compatible with the robot's surroundings. This assumption breaks when the demonstrator, operator, and robot inhabit different environments: a human motion may specify the intended behavior, but not the footholds, clearance, body height, or contact timing required by the robot's local terrain. We introduce \emph{Perceptive Behavior Foundation Model} (Perceptive BFM), a terrain-aware humanoid control framework that grounds human motion priors in robot-centric perception. The model preserves raw kinematic motion references as the behavioral interface, while using local terrain observations to adapt contacts, posture, and timing. To provide scalable terrain supervision, we develop \emph{terrain-conformal reference synthesis} (TCRS), which converts locomotion-oriented human motion clips into terrain-consistent references through contact-aware foothold construction, foot-geometry-aware swing optimization, support-aware root reconstruction, collision repair, and multi-point inverse kinematics. We then train a blind adapted-reference teacher and transfer its terrain-conformal behavior to a deployed raw-reference student through target-frame action alignment. The student is an identity-gated Transformer tracker whose terrain features enter through residual pathways initialized to preserve the motion-tracking prior and trained to produce local corrections only when needed.
翻译:人形机器人行为基础模型旨在从广泛的人体运动先验中获取可复用的全身控制策略,使单一控制器能够产生多样且富有表现力的行为。然而,现有以运动为中心的基础策略大多假设参考运动已与机器人的物理环境兼容。这种假设在示范者、操作员和机器人身处不同环境时失效:人体运动可能定义了预期行为,但未能提供机器人局部地形所需的落脚点、安全间隙、身体高度或接触时序。我们提出感知行为基础模型(Perceptive BFM),这是一种地形感知的人形机器人控制框架,将人体运动先验锚定于以机器人为中心的感知。该模型保留原始运动学运动参考作为行为接口,同时利用局部地形观测来调整接触、姿态和时序。为提供可扩展的地形监督,我们开发了地形共形参考合成(TCRS),通过接触感知的落脚点构建、足部几何感知的摆动优化、支撑感知的躯干重建、碰撞修复以及多点逆运动学,将面向运动的运动片段转换为地形一致参考。随后训练一个盲适应参考教师模型,并通过目标帧动作对齐将其地形共形行为迁移至部署的原始参考学生模型。该学生模型基于身份门控Transformer跟踪器,其地形特征通过初始化为保留运动跟踪先验的残差路径输入,仅在必要时训练以产生局部修正。