Online Reinforcement learning (RL) typically requires high-stakes online interaction data to learn a policy for a target task. This prompts interest in leveraging historical data to improve sample efficiency. The historical data may come from outdated or related source environments with different dynamics. It remains unclear how to effectively use such data in the target task to provably enhance learning and sample efficiency. To address this, we propose a hybrid transfer RL (HTRL) setting, where an agent learns in a target environment while accessing offline data from a source environment with shifted dynamics. We show that -- without information on the dynamics shift -- general shifted-dynamics data, even with subtle shifts, does not reduce sample complexity in the target environment. However, with prior information on the degree of the dynamics shift, we design HySRL, a transfer algorithm that achieves problem-dependent sample complexity and outperforms pure online RL. Finally, our experimental results demonstrate that HySRL surpasses state-of-the-art online RL baseline.
翻译:在线强化学习(RL)通常需要高风险在线交互数据来学习目标任务的策略,这促使研究者关注如何利用历史数据提升样本效率。历史数据可能来源于具有不同动态特性的过时或相关源环境。目前尚不清楚如何在目标任务中有效利用此类数据以可证明地提升学习效果与样本效率。针对此问题,我们提出混合迁移强化学习(HTRL)框架,其中智能体在目标环境中学习的同时,可访问来自具有动态偏移的源环境的离线数据。我们证明——在缺乏动态偏移信息的情况下——即使存在细微偏移,通用的动态偏移数据也无法降低目标环境中的样本复杂度。然而,若掌握动态偏移程度的先验信息,我们设计的迁移算法HySRL能够实现问题相关的样本复杂度,并优于纯在线RL方法。最终,实验结果表明HySRL超越了当前最先进的在线RL基线方法。