In multi-goal reinforcement learning for a given environment, agents learn policies to achieve multiple goals by using experiences gained from interactions with the environment. One of the key challenges in this setting is training agents using sparse binary rewards, which can be difficult due to a lack of successful experiences. To address this challenge, hindsight experience replay (HER) generates successful experiences from unsuccessful experiences. However, the process of generating successful experiences from uniformly sampled ones can be inefficient. In this paper, a novel approach called Failed goal Aware HER (FAHER) is proposed to enhance the sampling efficiency. The approach exploits the property of achieved goals in relation to failed goals that are defined as the original goals not achieved. The proposed method involves clustering episodes with different achieved goals using a cluster model and subsequently sampling experiences in the manner of HER. The cluster model is generated by applying a clustering algorithm to failed goals. The proposed method is validated by experiments with three robotic control tasks of the OpenAI gym. The results of experiments demonstrate that the proposed method is more sample efficient and achieves improved performance over baseline approaches.
翻译:在多目标强化学习中,智能体通过利用与环境交互获得的经验学习实现多个目标的策略。该场景的关键挑战之一是使用稀疏二值奖励训练智能体,由于缺乏成功经验,这往往难以实现。为应对这一挑战,hindsight经验回放(HER)从失败经验中生成成功经验。然而,从均匀采样经验中生成成功经验的过程可能效率低下。本文提出一种名为失败目标感知HER(FAHER)的新方法以提高采样效率。该方法利用已实现目标相对于失败目标(定义为未实现的原始目标)的特性。所提方法包括使用聚类模型对不同已实现目标的回合进行聚类,随后按HER方式采样经验。该聚类模型通过将聚类算法应用于失败目标生成。通过在OpenAI gym的三种机器人控制任务上进行实验验证,结果表明所提方法比基线方法更具样本效率,并取得了更优的性能。