The co-adaptation of robots has been a long-standing research endeavour with the goal of adapting both body and behaviour of a system for a given task, inspired by the natural evolution of animals. Co-adaptation has the potential to eliminate costly manual hardware engineering as well as improve the performance of systems. The standard approach to co-adaptation is to use a reward function for optimizing behaviour and morphology. However, defining and constructing such reward functions is notoriously difficult and often a significant engineering effort. This paper introduces a new viewpoint on the co-adaptation problem, which we call co-imitation: finding a morphology and a policy that allow an imitator to closely match the behaviour of a demonstrator. To this end we propose a co-imitation methodology for adapting behaviour and morphology by matching state distributions of the demonstrator. Specifically, we focus on the challenging scenario with mismatched state- and action-spaces between both agents. We find that co-imitation increases behaviour similarity across a variety of tasks and settings, and demonstrate co-imitation by transferring human walking, jogging and kicking skills onto a simulated humanoid.
翻译:机器人的协同适应是一个长期的研究方向,其灵感来源于动物的自然进化,旨在针对特定任务同时优化系统的身体形态和行为。协同适应有潜力消除昂贵的手动硬件工程,并提升系统性能。标准的协同适应方法是使用奖励函数来优化行为与形态。然而,定义并构建此类奖励函数极为困难,且通常需要大量工程投入。本文提出了一种关于协同适应问题的新视角,我们称之为“协同模仿”:即寻找一种形态和策略,使模仿者能够紧密匹配演示者的行为。为此,我们提出了一种通过匹配演示者的状态分布来调整行为与形态的协同模仿方法。具体而言,我们聚焦于两个智能体状态空间和动作空间不匹配的挑战性场景。研究发现,协同模仿能在多种任务和设定下提高行为相似性,并通过将人类的行走、慢跑和踢腿技能迁移至模拟人形机器人上,展示了协同模仿的效果。