Potential-based reward shaping (PBRS) is a particular category of machine learning methods which aims to improve the learning speed of a reinforcement learning agent by extracting and utilizing extra knowledge while performing a task. There are two steps in the process of transfer learning: extracting knowledge from previously learned tasks and transferring that knowledge to use it in a target task. The latter step is well discussed in the literature with various methods being proposed for it, while the former has been explored less. With this in mind, the type of knowledge that is transmitted is very important and can lead to considerable improvement. Among the literature of both the transfer learning and the potential-based reward shaping, a subject that has never been addressed is the knowledge gathered during the learning process itself. In this paper, we presented a novel potential-based reward shaping method that attempted to extract knowledge from the learning process. The proposed method extracts knowledge from episodes' cumulative rewards. The proposed method has been evaluated in the Arcade learning environment and the results indicate an improvement in the learning process in both the single-task and the multi-task reinforcement learner agents.
翻译:基于势能的奖励塑形(PBRS)是机器学习方法中的一类特殊方法,旨在通过在执行任务时提取并利用额外知识来提高强化学习智能体的学习速度。迁移学习过程包含两个步骤:从前序学习任务中提取知识,以及将提取的知识迁移至目标任务。现有文献对后者进行了充分讨论并提出了多种方法,而对前者的探索相对较少。基于此,所传递知识的类型至关重要,可显著提升学习效果。在迁移学习与基于势能的奖励塑形的研究文献中,一个从未被探讨的课题是学习过程中自身积累的知识。本文提出了一种新颖的基于势能的奖励塑形方法,该方法尝试从学习过程中提取知识。所提方法通过提取各回合累积奖励中的知识实现知识获取。该方法已在Arcade学习环境中进行评估,结果表明其在单任务与多任务强化学习智能体的学习过程中均展现出改进效果。