Knowledge Transfer for Cross-Domain Reinforcement Learning: A Systematic Review

Reinforcement Learning (RL) provides a framework in which agents can be trained, via trial and error, to solve complex decision-making problems. Learning with little supervision causes RL methods to require large amounts of data, rendering them too expensive for many applications (e.g., robotics). By reusing knowledge from a different task, knowledge transfer methods present an alternative to reduce the training time in RL. Given the severe data scarcity, due to their flexibility, there has been a growing interest in methods capable of transferring knowledge across different domains (i.e., problems with different representations). However, identifying similarities and adapting knowledge across tasks from different domains requires matching their representations or finding domain-invariant features. These processes can be data-demanding, which poses the main challenge in cross-domain knowledge transfer: to select and transform knowledge in a data-efficient way, such that it accelerates learning in the target task, despite the presence of significant differences across problems (e.g., robots with distinct morphologies). Thus, this review presents a unifying analysis of methods focused on transferring knowledge across different domains. Through a taxonomy based on a transfer-approach categorization and a characterization of works based on their data-assumption requirements, the contributions of this article are 1) a comprehensive and systematic revision of knowledge transfer methods for the cross-domain RL setting, 2) a categorization and characterization of such methods to provide an analysis based on relevant features such as their transfer approach and data requirements, and 3) a discussion on the main challenges regarding cross-domain knowledge transfer, as well as on ideas of future directions worth exploring to address these problems.

翻译：强化学习（Reinforcement Learning, RL）为智能体提供了一个通过试错训练来解决复杂决策问题的框架。由于监督信息极少，RL方法通常需要大量数据，导致其在许多应用（如机器人学）中成本过高。通过复用来自不同任务的知识，知识迁移方法为减少RL训练时间提供了一种替代途径。鉴于数据严重匮乏，且因其灵活性，能够跨不同领域（即具有不同表示形式的问题）迁移知识的方法日益受到关注。然而，识别不同领域任务间的相似性并适配知识，需要匹配其表示或寻找领域不变特征。这些过程可能对数据需求较高，构成了跨领域知识迁移的主要挑战：如何以数据高效的方式选择和转换知识，从而加速目标任务的学习，尽管问题间存在显著差异（例如形态各异的机器人）。因此，本综述对专注于跨不同领域迁移知识的方法进行了统一分析。通过基于迁移方法分类的体系架构，以及基于数据假设需求对相关工作的特征刻画，本文的贡献包括：1）对跨领域RL场景下的知识迁移方法进行全面而系统的梳理；2）对此类方法进行分类与特征刻画，基于迁移方法和数据需求等相关特征提供分析；3）讨论跨领域知识迁移面临的主要挑战，以及值得探索的未来研究方向以解决这些问题。