COMPOSER: Scalable and Robust Modular Policies for Snake Robots

Snake robots have showcased remarkable compliance and adaptability in their interaction with environments, mirroring the traits of their natural counterparts. While their hyper-redundant and high-dimensional characteristics add to this adaptability, they also pose great challenges to robot control. Instead of perceiving the hyper-redundancy and flexibility of snake robots as mere challenges, there lies an unexplored potential in leveraging these traits to enhance robustness and generalizability at the control policy level. We seek to develop a control policy that effectively breaks down the high dimensionality of snake robots while harnessing their redundancy. In this work, we consider the snake robot as a modular robot and formulate the control of the snake robot as a cooperative Multi-Agent Reinforcement Learning (MARL) problem. Each segment of the snake robot functions as an individual agent. Specifically, we incorporate a self-attention mechanism to enhance the cooperative behavior between agents. A high-level imagination policy is proposed to provide additional rewards to guide the low-level control policy. We validate the proposed method COMPOSER with five snake robot tasks, including goal reaching, wall climbing, shape formation, tube crossing, and block pushing. COMPOSER achieves the highest success rate across all tasks when compared to a centralized baseline and four modular policy baselines. Additionally, we show enhanced robustness against module corruption and significantly superior zero-shot generalizability in our proposed method. The videos of this work are available on our project page: https://sites.google.com/view/composer-snake/.

翻译：蛇形机器人在与环境交互时展现出显著的柔顺性和适应性，与其生物对应物的特性相呼应。然而，其高度冗余和高维特性在增强适应性的同时，也给机器人控制带来了巨大挑战。我们不应将蛇形机器人的超冗余性和灵活性仅视为挑战，而应探索利用这些特性在控制策略层面增强鲁棒性和泛化性的未开发潜力。本文旨在开发一种能够有效分解蛇形机器人高维特性并利用其冗余性的控制策略。我们将蛇形机器人视为模块化机器人，将控制问题表述为协作式多智能体强化学习（MARL）问题。蛇形机器人的每个节段作为独立智能体运行。具体而言，我们引入自注意力机制以增强智能体间的协作行为，并提出高阶想象策略为低阶控制策略提供额外奖励引导。通过五项蛇形机器人任务（包括目标到达、墙壁攀爬、形状形成、管道穿越和障碍物推动）对COMPOSER方法进行验证。与集中式基线及四种模块化策略基线相比，COMPOSER在所有任务中均实现了最高成功率。此外，所提方法展现出更强的抗模块损坏鲁棒性及显著更优的零样本泛化能力。本工作相关视频详见项目主页：https://sites.google.com/view/composer-snake/。