Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RM), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Our empirical evaluation shows that our representations improve sample efficiency and few-shot transfer in a variety of domains.
翻译:近期研究表明,深度强化学习智能体容易过度适应其训练任务,而难以适应微小的环境变化。为了加速向未见任务的迁移学习,我们提出了一种新方法,利用奖励机——一种基于当前任务奖励与动态特性诱导子任务的状态机抽象——来表示当前任务。我们的方法为智能体提供了从当前抽象状态出发的最优转移的符号表示,并因实现这些转移而给予奖励。这些表示可在不同任务间共享,使得智能体能够利用先前遇到的符号与转移的知识,从而增强迁移能力。我们的实证评估表明,该表示在多个领域中提高了样本效率与少样本迁移性能。