Multitask Reinforcement Learning (MTRL) approaches have gained increasing attention for its wide applications in many important Reinforcement Learning (RL) tasks. However, while recent advancements in MTRL theory have focused on the improved statistical efficiency by assuming a shared structure across tasks, exploration--a crucial aspect of RL--has been largely overlooked. This paper addresses this gap by showing that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design like $\epsilon$-greedy that are inefficient in general can be sample-efficient for MTRL. To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL. It may also shed light on the enigmatic success of the wide applications of myopic exploration in practice. To validate the role of diversity, we conduct experiments on synthetic robotic control environments, where the diverse task set aligns with the task selection by automatic curriculum learning, which is empirically shown to improve sample-efficiency.
翻译:多任务强化学习(MTRL)方法因其在众多重要强化学习(RL)任务中的广泛应用而日益受到关注。然而,尽管近期MTRL理论的研究进展通过假设任务间存在共享结构来聚焦于统计效率的提升,但作为强化学习关键要素的"探索"在很大程度上被忽视了。本文通过证明:当智能体在足够多样化的任务集上训练时,采用诸如$\epsilon$-贪心这类在一般情况下效率低下的近视探索策略的通用策略共享算法,在MTRL中可实现采样高效性,从而填补了这一研究空白。据我们所知,这是首个从理论上论证MTRL"探索优势"的研究,同时也为近视探索在实践中的广泛应用所取得的惊人成功提供了启示。为验证多样性的作用,我们在合成机器人控制环境中开展实验,其中多样化任务集与自动课程学习选择的任务集保持一致性,实验结果表明该机制能有效提升采样效率。