Despite many successful applications of data-driven control in robotics, extracting meaningful diverse behaviors remains a challenge. Typically, task performance needs to be compromised in order to achieve diversity. In many scenarios, task requirements are specified as a multitude of reward terms, each requiring a different trade-off. In this work, we take a constrained optimization viewpoint on the quality-diversity trade-off and show that we can obtain diverse policies while imposing constraints on their value functions which are defined through distinct rewards. In line with previous work, further control of the diversity level can be achieved through an attract-repel reward term motivated by the Van der Waals force. We demonstrate the effectiveness of our method on a local navigation task where a quadruped robot needs to reach the target within a finite horizon. Finally, our trained policies transfer well to the real 12-DoF quadruped robot, Solo12, and exhibit diverse agile behaviors with successful obstacle traversal.
翻译:尽管数据驱动控制在机器人领域取得了许多成功应用,但提取有意义的多样行为仍是一个挑战。通常,为了实现多样性,任务性能需要做出妥协。在许多场景中,任务要求被指定为多种奖励项,每项都需要不同的权衡。在本研究中,我们从约束优化的角度看待质量-多样性权衡,并表明我们可以在价值函数上施加约束(这些价值函数通过不同奖励定义)的同时获得多样策略。与先前工作一致,通过引入受范德华力启发的吸引-排斥奖励项,可以进一步控制多样性水平。我们在四足机器人需在有限时间范围内到达目标的局部导航任务中验证了该方法的效果。最后,我们的训练策略成功迁移至真实的12自由度四足机器人Solo12上,展现出多样的敏捷行为并实现成功穿越障碍物。