Tasks involving locally unstable or discontinuous dynamics (such as bifurcations and collisions) remain challenging in robotics, because small variations in the environment can have a significant impact on task outcomes. For such tasks, learning a robust deterministic policy is difficult. We focus on structuring exploration with multiple stochastic policies based on a mixture of experts (MoE) policy representation that can be efficiently adapted. The MoE policy is composed of stochastic sub-policies that allow exploration of multiple distinct regions of the action space (or strategies) and a high-level selection policy to guide exploration towards the most promising regions. We develop a robot system to evaluate our approach in a real-world physical problem solving domain. After training the MoE policy in simulation, online learning in the real world demonstrates efficient adaptation within just a few dozen attempts, with a minimal sim2real gap. Our results confirm that representing multiple strategies promotes efficient adaptation in new environments and strategies learned under different dynamics can still provide useful information about where to look for good strategies.
翻译:涉及局部不稳定或不连续动力学(例如分叉和碰撞)的任务在机器人学中仍具挑战性,因为环境中的微小变化可能对任务结果产生显著影响。对于此类任务,学习鲁棒的确定性策略十分困难。我们聚焦于基于混合专家(MoE)策略表示的结构化探索方法,该方法由多个随机策略组成,可高效适应。MoE策略由多个随机子策略(允许探索动作空间中多个不同区域或策略)以及一个高层选择策略(引导探索朝向最有前景的区域)构成。我们开发了一套机器人系统,用于在真实世界物理问题求解领域中评估该方法。在仿真中训练MoE策略后,真实世界的在线学习仅需数十次尝试即可实现高效适应,且仿真到现实(sim2real)差异极小。我们的结果证实,表示多种策略能促进在新环境中的高效适应,且在不同动力学条件下学习的策略仍可为寻求优质策略提供有价值的方位信息。