Fall recovery is critical for autonomous legged locomotion. Existing methods have demonstrated that some legged robots, such as humanoids and quadrupeds, are capable of fall recovery from diverse postures by utilizing arms or coordinating multi-legs to generate support forces. Without arms or other legs to provide supportive assistance, a bipedal-wheeled robot must rely solely on the actuation of its legs, making recovery particularly difficult. To address this, we introduce FTSR (Force-guided Teacher-student framework with Stage-wise Rewards). The force-guided method constructs an external auxiliary force during simulation training that correlates directly with the robot's real-time height, explicitly formulating this force as an optimizable constraint. Through constrained reinforcement learning, the policy is guided toward reducing force dependency gradually and increasing the body height, developing internal recovery strategies despite having no arms for support. Height-progressive stage-Wise rewards progressively structure posture stabilization during recovery and transition to sustained locomotion, integrated with teacher-student architecture distilling privileged knowledge of force effects and recovery dynamics. After simulation training, the policy is deployed on a physical armless bipedal-wheeled robot and extensively evaluated. Experiments confirm robust and reliable fall recovery under diverse challenging conditions, demonstrating strong environmental adaptability and motion robustness, while maintaining full post-recovery motion capability. The framework also generalizes effectively to a high-DOF humanoid, confirming its practical generalizability. The project page is available at https://2350575870.github.io/force-guided.github.io/
翻译:摔倒恢复对于自主腿部运动至关重要。现有方法已证明,某些腿足机器人(如人形机器人和四足机器人)能够通过利用手臂或协调多腿产生支撑力,从多种姿态中实现摔倒恢复。然而,无臂或缺乏其他腿部提供辅助支撑的轮式双足机器人必须仅依靠其腿部驱动,这使得恢复特别困难。针对这一问题,我们提出了FTSR(基于阶段奖励的力引导教师-学生框架)。该力引导方法在仿真训练期间构建一个与机器人实时高度直接相关的外部辅助力,并明确将该力建模为可优化的约束。通过约束强化学习,策略被引导逐步减少对力的依赖并增加身体高度,从而在无手臂支撑的情况下发展内部恢复策略。高度递进的阶段奖励在恢复过程中逐步构建姿态稳定,并过渡到持续运动,结合教师-学生架构蒸馏关于力效应和恢复动态的特权知识。仿真训练后,该策略被部署在物理无臂轮式双足机器人上并进行广泛评估。实验证实,在各种具有挑战性的条件下,该方法实现了鲁棒且可靠的摔倒恢复,展现出强大的环境适应性和运动鲁棒性,同时保持恢复后的完整运动能力。该框架还能有效泛化至高自由度人形机器人,验证了其实用泛化能力。项目页面见 https://2350575870.github.io/force-guided.github.io/