We propose a state estimation method that can accurately predict the robot's privileged states to push the limits of quadruped robots in executing advanced skills such as jumping in the wild. In particular, we present the State Estimation Transformers (SET), an architecture that casts the state estimation problem as conditional sequence modeling. SET outputs the robot states that are hard to obtain directly in the real world, such as the body height and velocities, by leveraging a causally masked Transformer. By conditioning an autoregressive model on the robot's past states, our SET model can predict these privileged observations accurately even in highly dynamic locomotions. We evaluate our methods on three tasks -- running jumping, running backflipping, and running sideslipping -- on a low-cost quadruped robot, Cyberdog2. Results show that SET can outperform other methods in estimation accuracy and transferability in the simulation as well as success rates of jumping and triggering a recovery controller in the real world, suggesting the superiority of such a Transformer-based explicit state estimator in highly dynamic locomotion tasks.
翻译:我们提出了一种状态估计方法,能够准确预测机器人的特权状态,以突破四足机器人在执行野外跳跃等高级技能时的极限。具体而言,我们提出了状态估计Transformer(SET),这是一种将状态估计问题构建为条件序列建模的架构。SET通过利用因果掩码Transformer,输出在现实世界中难以直接获取的机器人状态,例如机身高度和速度。通过对自回归模型施加机器人过去状态的约束,我们的SET模型即使在高度动态的运动中也能准确预测这些特权观测值。我们在低成本四足机器人Cyberdog2上评估了我们的方法在三个任务上的表现——奔跑跳跃、奔跑后空翻和奔跑侧滑。结果表明,无论是在仿真中的估计精度和可迁移性,还是在现实世界中跳跃的成功率和触发恢复控制器的表现上,SET均优于其他方法,这表明这种基于Transformer的显式状态估计器在高度动态的运动任务中具有优越性。