We propose Structured Exploration with Achievements (SEA), a multi-stage reinforcement learning algorithm designed for achievement-based environments, a particular type of environment with an internal achievement set. SEA first uses offline data to learn a representation of the known achievements with a determinant loss function, then recovers the dependency graph of the learned achievements with a heuristic algorithm, and finally interacts with the environment online to learn policies that master known achievements and explore new ones with a controller built with the recovered dependency graph. We empirically demonstrate that SEA can recover the achievement structure accurately and improve exploration in hard domains such as Crafter that are procedurally generated with high-dimensional observations like images.
翻译:我们提出基于成就的结构化探索(SEA),一种为成就型环境设计的多阶段强化学习算法。该算法首先利用离线数据通过行列式损失函数学习已知成就的表示,随后采用启发式算法恢复所学成就间的依赖关系图,最终通过在线环境交互构建基于该依赖图的控制器,以掌握已知成就并探索新成就的策略。实验表明,SEA能够精确恢复成就结构,并在诸如Crafter等程序化生成的高维图像观测难度域中有效提升探索效率。