Enabling legged robots to perform non-prehensile loco-manipulation is crucial for enhancing their versatility. Learning behaviors such as whole-body object pushing often requires sophisticated planning strategies or extensive task-specific reward shaping, especially in unstructured environments. In this work, we present CAIMAN, a practical reinforcement learning framework that encourages the agent to gain control over other entities in the environment. CAIMAN leverages causal action influence as an intrinsic motivation objective, allowing legged robots to efficiently acquire object pushing skills even under sparse task rewards. We employ a hierarchical control strategy, combining a low-level locomotion module with a high-level policy that generates task-relevant velocity commands and is trained to maximize the intrinsic reward. To estimate causal action influence, we learn the dynamics of the environment by integrating a kinematic prior with data collected during training. We empirically demonstrate CAIMAN's superior sample efficiency and adaptability to diverse scenarios in simulation, as well as its successful transfer to real-world systems without further fine-tuning. A video demo is available at https://www.youtube.com/watch?v=dNyvT04Cqaw.
翻译:使足式机器人能够执行非抓取式运动操作对于提升其多功能性至关重要。学习全身物体推动等行为通常需要复杂的规划策略或大量的任务特定奖励塑形,尤其是在非结构化环境中。本研究提出CAIMAN,一种实用的强化学习框架,旨在激励智能体获得对环境实体的控制能力。CAIMAN利用因果作用影响作为内在激励目标,使足式机器人即使在稀疏任务奖励条件下也能高效掌握物体推动技能。我们采用分层控制策略,将底层运动模块与高层策略相结合:高层策略生成任务相关速度指令,并通过最大化内在奖励进行训练。为估计因果作用影响,我们通过整合运动学先验与训练期间收集的数据来学习环境动力学模型。实验证明CAIMAN在仿真环境中具有卓越的样本效率与多场景适应性,并能无需微调直接迁移至真实系统。演示视频详见:https://www.youtube.com/watch?v=dNyvT04Cqaw。