Control policy learning for modular robot locomotion has previously been limited to proprioceptive feedback and flat terrain. This paper develops policies for modular systems with vision traversing more challenging environments. These modular robots can be reconfigured to form many different designs, where each design needs a controller to function. Though one could create a policy for individual designs and environments, such an approach is not scalable given the wide range of potential designs and environments. To address this challenge, we create a visual-motor policy that can generalize to both new designs and environments. The policy itself is modular, in that it is divided into components, each of which corresponds to a type of module (e.g., a leg, wheel, or body). The policy components can be recombined during training to learn to control multiple designs. We develop a deep reinforcement learning algorithm where visual observations are input to a modular policy interacting with multiple environments at once. We apply this algorithm to train robots with combinations of legs and wheels, then demonstrate the policy controlling real robots climbing stairs and curbs.
翻译:模块化机器人步态控制策略的学习此前仅限于本体感觉反馈和平坦地形。本文针对配备视觉系统的模块化机器人在更具挑战性环境中的穿越问题开发了相关策略。这些模块化机器人可重构为多种不同构型,每种构型都需要控制器才能运作。虽然可以为特定构型和环境单独创建策略,但面对潜在的大量构型与环境组合,这种方法的可扩展性有限。为应对这一挑战,我们提出了一种能泛化至新构型和新环境的视觉-运动策略。该策略本身具有模块化特性,即被拆分为多个组件,每个组件对应一种模块类型(例如腿、轮子或主体)。训练过程中策略组件可重新组合,以学习控制多种构型。我们开发了一种深度强化学习算法,使视觉观测输入能够与同时与多个环境交互的模块化策略相结合。应用该算法训练了具有腿-轮组合构型的机器人,并演示了该策略如何控制真实机器人攀爬楼梯和路缘。