Reinforcement learning (RL) is crucial for data science decision-making but suffers from sample inefficiency, particularly in real-world scenarios with costly physical interactions. This paper introduces a novel human-inspired framework to enhance RL algorithm sample efficiency. It achieves this by initially exposing the learning agent to simpler tasks that progressively increase in complexity, ultimately leading to the main task. This method requires no pre-training and involves learning simpler tasks for just one iteration. The resulting knowledge can facilitate various transfer learning approaches, such as value and policy transfer, without increasing computational complexity. It can be applied across different goals, environments, and RL algorithms, including value-based, policy-based, tabular, and deep RL methods. Experimental evaluations demonstrate the framework's effectiveness in enhancing sample efficiency, especially in challenging main tasks, demonstrated through both a simple Random Walk and more complex optimal control problems with constraints.
翻译:强化学习(RL)对数据科学决策至关重要,但在实际场景中因涉及高昂的物理交互而存在样本效率低下的问题。本文提出了一种新颖的受人类启发的框架,旨在提升强化学习算法的样本效率。该方法通过让学习代理先接触逐步增加复杂度的简化任务,最终完成主任务来实现这一目标。该框架无需预训练,且每个简化任务仅需学习一个迭代周期。所获得的知识可支持多种迁移学习策略(如价值迁移与策略迁移),且不增加计算复杂度。该框架可适用于不同目标、环境以及强化学习算法,包括基于价值、基于策略、表格化与深度强化学习方法。实验评估表明,该框架能有效提升样本效率,尤其在挑战性主任务中表现突出,这一结论基于简单随机游走问题及更复杂的带约束最优控制问题的验证。