Hierarchical reinforcement learning composites subpolicies in different hierarchies to accomplish complex tasks.Automated subpolicies discovery, which does not depend on domain knowledge, is a promising approach to generating subpolicies.However, the degradation problem is a challenge that existing methods can hardly deal with due to the lack of consideration of diversity or the employment of weak regularizers. In this paper, we propose a novel task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer (WDER), which enlarges the diversity of subpolicies by maximizing the Wasserstein distances among action distributions. The proposed WDER can be easily incorporated into the loss function of existing methods to boost their performance further.Experimental results demonstrate that our WDER improves performance and sample efficiency in comparison with prior work without modifying hyperparameters, which indicates the applicability and robustness of the WDER.
翻译:分层强化学习通过在不同层级组合子策略来完成复杂任务。自动子策略发现无需领域知识,是一种生成子策略的有前景的方法。然而,由于缺乏对多样性的考虑或采用较弱正则化器,现有方法难以应对退化问题。本文提出一种名为Wasserstein多样性增强正则化器(WDER)的新型任务无关正则化器,通过最大化动作分布之间的Wasserstein距离来增强子策略的多样性。所提出的WDER可轻松融入现有方法的损失函数中,以进一步提升其性能。实验结果表明,与先前工作相比,我们的WDER在不修改超参数的情况下提升了性能与样本效率,这体现了WDER的适用性与鲁棒性。