Meta-reinforcement learning (meta-RL) aims to quickly solve new tasks by leveraging knowledge from prior tasks. However, previous studies often assume a single mode homogeneous task distribution, ignoring possible structured heterogeneity among tasks. Leveraging such structures can better facilitate knowledge sharing among related tasks and thus improve sample efficiency. In this paper, we explore the structured heterogeneity among tasks via clustering to improve meta-RL. We develop a dedicated exploratory policy to discover task structures via divide-and-conquer. The knowledge of the identified clusters helps to narrow the search space of task-specific information, leading to more sample efficient policy adaptation. Experiments on various MuJoCo tasks showed the proposed method can unravel cluster structures effectively in both rewards and state dynamics, proving strong advantages against a set of state-of-the-art baselines.
翻译:元强化学习旨在通过利用先前任务的知识快速解决新任务。然而,以往研究通常假设单一模式的同质任务分布,忽略了任务间可能存在的结构化异质性。利用这种结构可以更好地促进相关任务间的知识共享,从而提高样本效率。本文通过聚类探索任务间的结构化异质性以改进元强化学习。我们设计了一种专用的探索性策略,通过分治策略发现任务结构。识别出的聚类知识有助于缩小任务特定信息的搜索空间,从而实现更高效的策略适应。在多种MuJoCo任务上的实验表明,所提方法能够有效揭示奖励和状态动态中的聚类结构,相比一系列最先进的基线方法展现出显著优势。