Quadruped robots are used for primary searches during the early stages of indoor fires. A typical primary search involves quickly and thoroughly looking for victims under hazardous conditions and monitoring flammable materials. However, situational awareness in complex indoor environments and rapid stair climbing across different staircases remain the main challenges for robot-assisted primary searches. In this project, we designed a two-stage end-to-end deep reinforcement learning (RL) approach to optimize both navigation and locomotion. In the first stage, the quadrupeds, Unitree Go2, were trained to climb stairs in Isaac Lab's pyramid-stair terrain. In the second stage, the quadrupeds were trained to climb various realistic indoor staircases in the Isaac Lab engine, with the learned policy transferred from the previous stage. These indoor staircases are straight, L-shaped, and spiral, to support climbing tasks in complex environments. This project explores how to balance navigation and locomotion and how end-to-end RL methods can enable quadrupeds to adapt to different stair shapes. Our main contributions are: (1) A two-stage end-to-end RL framework that transfers stair-climbing skills from abstract pyramid terrain to realistic indoor stair topologies. (2) A centerline-based navigation formulation that enables unified learning of navigation and locomotion without hierarchical planning. (3) Demonstration of policy generalization across diverse staircases using only local height-map perception. (4) An empirical analysis of success, efficiency, and failure modes under increasing stair difficulty.
翻译:四足机器人被用于室内火灾初期的初步搜救。典型的初步搜救任务包括在危险条件下快速、彻底地寻找受害者并监控易燃材料。然而,在复杂的室内环境中保持态势感知以及在不同楼梯间快速爬升,仍然是机器人辅助初步搜救面临的主要挑战。在本项目中,我们设计了一种两阶段端到端深度强化学习方法,以同时优化导航与运动控制。在第一阶段,四足机器人(Unitree Go2)在Isaac Lab的金字塔楼梯地形中进行爬楼梯训练。在第二阶段,四足机器人在Isaac Lab引擎中训练攀爬各种真实的室内楼梯,并将前一阶段习得的策略迁移至此。这些室内楼梯包括直梯、L形梯和螺旋梯,以支持复杂环境下的攀爬任务。本项目探讨了如何平衡导航与运动控制,以及端到端强化学习方法如何使四足机器人适应不同的楼梯形态。我们的主要贡献包括:(1)一个两阶段端到端强化学习框架,可将爬楼梯技能从抽象的金字塔地形迁移至真实的室内楼梯拓扑结构。(2)一种基于中心线的导航建模方法,无需分层规划即可实现导航与运动的统一学习。(3)仅利用局部高度图感知,展示了策略在不同楼梯间的泛化能力。(4)对楼梯难度递增情况下的成功率、效率及失败模式进行了实证分析。