Despite recent progress in Reinforcement Learning for robotics applications, many tasks remain prohibitively difficult to solve because of the expensive interaction cost. Transfer learning helps reduce the training time in the target domain by transferring knowledge learned in a source domain. Sim2Real transfer helps transfer knowledge from a simulated robotic domain to a physical target domain. Knowledge transfer reduces the time required to train a task in the physical world, where the cost of interactions is high. However, most existing approaches assume exact correspondence in the task structure and the physical properties of the two domains. This work proposes a framework for Few-Shot Policy Transfer between two domains through Observation Mapping and Behavior Cloning. We use Generative Adversarial Networks (GANs) along with a cycle-consistency loss to map the observations between the source and target domains and later use this learned mapping to clone the successful source task behavior policy to the target domain. We observe successful behavior policy transfer with limited target task interactions and in cases where the source and target task are semantically dissimilar.
翻译:尽管近期强化学习在机器人应用方面取得了进展,但由于高昂的交互成本,许多任务仍然极难解决。迁移学习通过将源领域学到的知识迁移到目标领域,有助于减少目标领域的训练时间。仿真到现实迁移有助于将知识从仿真机器人领域迁移到物理目标领域。知识迁移减少了在交互成本高昂的物理世界中训练任务所需的时间。然而,现有大多数方法假设了两个领域在任务结构和物理属性上具有精确对应关系。本文提出了一种通过观测映射和行为克隆实现两个领域间小样本策略迁移的框架。我们使用生成对抗网络(GANs)结合循环一致性损失来映射源领域与目标领域间的观测,随后利用该学习到的映射将成功的源任务行为策略克隆到目标领域。我们观察到,在目标任务交互受限以及源任务与目标任务语义不相似的情况下,仍能实现成功的行为策略迁移。