The ability to leverage shared behaviors between tasks is critical for sample-efficient multi-task reinforcement learning (MTRL). While prior methods have primarily explored parameter and data sharing, direct behavior-sharing has been limited to task families requiring similar behaviors. Our goal is to extend the efficacy of behavior-sharing to more general task families that could require a mix of shareable and conflicting behaviors. Our key insight is an agent's behavior across tasks can be used for mutually beneficial exploration. To this end, we propose a simple MTRL framework for identifying shareable behaviors over tasks and incorporating them to guide exploration. We empirically demonstrate how behavior sharing improves sample efficiency and final performance on manipulation and navigation MTRL tasks and is even complementary to parameter sharing. Result videos are available at https://sites.google.com/view/qmp-mtrl.
翻译:利用任务间共享行为的能力对于样本高效的多任务强化学习(MTRL)至关重要。现有方法主要探索参数共享与数据共享,而直接的行为共享仅局限于需要相似行为的任务族。我们的目标是将行为共享的有效性推广至更一般的任务族,这些任务族可能需要可共享行为与冲突行为的混合。关键见解在于:智能体在不同任务中的行为可用于互利探索。为此,我们提出一个简单的MTRL框架,用于识别任务间可共享行为并融入其中以引导探索。实验表明,行为共享如何提升操作与导航类MTRL任务的样本效率与最终性能,甚至能与参数共享形成互补。结果视频见https://sites.google.com/view/qmp-mtrl。