Real-time control for robotics is a popular research area in the reinforcement learning community. Through the use of techniques such as reward shaping, researchers have managed to train online agents across a multitude of domains. Despite these advances, solving goal-oriented tasks still requires complex architectural changes or hard constraints to be placed on the problem. In this article, we solve the problem of stacking multiple cubes by combining curriculum learning, reward shaping, and a high number of efficiently parallelized environments. We introduce two curriculum learning settings that allow us to separate the complex task into sequential sub-goals, hence enabling the learning of a problem that may otherwise be too difficult. We focus on discussing the challenges encountered while implementing them in a goal-conditioned environment. Finally, we extend the best configuration identified on a higher complexity environment with differently shaped objects.
翻译:机器人实时控制是强化学习社区中一个热门的研究领域。通过应用奖励塑形等技术,研究者已成功在众多领域中训练在线智能体。尽管取得了这些进展,解决面向目标的任务仍需要对问题施加复杂的架构变更或硬性约束。本文通过结合课程学习、奖励塑形以及大量高效并行的环境,解决了多个立方体堆叠问题。我们引入了两种课程学习设置,使得复杂任务能够分解为连续的子目标,从而教会了一个原本可能过于困难的问题。我们重点讨论了在目标条件化环境中实施这些方法时所遇到的挑战。最后,我们将识别出的最佳配置扩展至一个包含不同形状物体的更高复杂度环境中。