Improving sample efficiency is a key challenge in reinforcement learning, especially in environments with large state spaces and sparse rewards. In literature, this is resolved either through the use of auxiliary tasks (subgoals) or through clever exploration strategies. Exploration methods have been used to sample better trajectories in large environments while auxiliary tasks have been incorporated where the reward is sparse. However, few studies have attempted to tackle both large scale and reward sparsity at the same time. This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy. We present a way to learn value functions which can be used to sample actions and provide directed exploration. Experiments on navigation tasks with varying grid sizes demonstrate the performance advantages over several competitive baselines.
翻译:提升样本效率是强化学习中的关键挑战,尤其是在状态空间庞大且奖励稀疏的环境中。现有研究通常通过辅助任务(子目标)或巧妙的探索策略来解决这一问题。探索方法用于在大型环境中采样更优质的轨迹,而辅助任务则被应用于奖励稀疏场景。然而,鲜有研究同时应对大规模状态空间与奖励稀疏性的挑战。本文探索将探索与辅助任务学习相结合的新思路,基于通用价值函数(GVFs)与定向探索策略。我们提出了一种可学习价值函数的方法,该函数能够用于行动采样并提供定向探索。在不同网格规模的导航任务实验中,该方法相较于多个竞争基线模型展现出显著性能优势。