Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RMs), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Empirical results show that our representations improve sample efficiency and few-shot transfer in a variety of domains.
翻译:近年研究表明,深度强化学习智能体往往过度拟合其训练任务,难以适应环境的微小变化。为加速向未见任务的迁移学习,我们提出一种新方法,利用奖励机(一种基于当前任务奖励与动态机制诱导子任务的自动机抽象)对当前任务进行表征。该方法为智能体提供从当前抽象状态到最优转移的符号化表征,并对其达成此类转移的行为给予奖励。这些表征跨任务共享,使智能体能够利用先前遇到的符号与转移知识,从而增强迁移能力。实验结果表明,我们的表征在多个领域提升了样本效率与小样本迁移性能。