Reinforcement learning (RL) is a sub-domain of machine learning, mainly concerned with solving sequential decision-making problems by a learning agent that interacts with the decision environment to improve its behavior through the reward it receives from the environment. This learning paradigm is, however, well-known for being time-consuming due to the necessity of collecting a large amount of data, making RL suffer from sample inefficiency and difficult generalization. Furthermore, the construction of an explicit reward function that accounts for the trade-off between multiple desiderata of a decision problem is often a laborious task. These challenges have been recently addressed utilizing transfer and inverse reinforcement learning (T-IRL). In this regard, this paper is devoted to a comprehensive review of realizing the sample efficiency and generalization of RL algorithms through T-IRL. Following a brief introduction to RL, the fundamental T-IRL methods are presented and the most recent advancements in each research field have been extensively reviewed. Our findings denote that a majority of recent research works have dealt with the aforementioned challenges by utilizing human-in-the-loop and sim-to-real strategies for the efficient transfer of knowledge from source domains to the target domain under the transfer learning scheme. Under the IRL structure, training schemes that require a low number of experience transitions and extension of such frameworks to multi-agent and multi-intention problems have been the priority of researchers in recent years.
翻译:强化学习(Reinforcement Learning, RL)是机器学习的一个子领域,主要关注通过智能体与决策环境交互、依据从环境获得的奖励改进其行为,从而解决序列决策问题。然而,该学习范式因需收集大量数据而耗时显著,导致RL面临样本效率低下与泛化困难的问题。此外,构建能够权衡决策问题多重目标的显式奖励函数通常是一项繁琐的任务。近期,迁移学习与逆强化学习(Transfer and Inverse Reinforcement Learning, T-IRL)被用于应对这些挑战。为此,本文致力于全面综述通过T-IRL实现RL算法样本效率与泛化性的研究进展。在简要介绍RL后,本文阐述了T-IRL的基础方法,并对各研究领域的最新进展进行了详尽评述。我们的分析表明,近期多数研究工作通过采用人机交互与仿真到现实策略,在迁移学习框架下实现了从源领域到目标领域的高效知识迁移,以应对上述挑战。在IRL框架下,近年来研究重点集中于需要少量经验转移的训练方案,以及将该框架扩展至多智能体与多意图问题。