Creating reinforcement learning agents that generalise effectively to new tasks is a key challenge in AI research. This paper introduces Fracture Cluster Options (FraCOs), a multi-level hierarchical reinforcement learning method that achieves state-of-the-art performance on difficult generalisation tasks. FraCOs identifies patterns in agent behaviour and forms options based on the expected future usefulness of those patterns, enabling rapid adaptation to new tasks. In tabular settings, FraCOs demonstrates effective transfer and improves performance as it grows in hierarchical depth. We evaluate FraCOs against state-of-the-art deep reinforcement learning algorithms in several complex procedurally generated environments. Our results show that FraCOs achieves higher in-distribution and out-of-distribution performance than competitors.
翻译:构建能够有效泛化至新任务的强化学习智能体是人工智能研究中的一个关键挑战。本文介绍了Fracture Cluster Options (FraCOs),一种多级分层强化学习方法,在困难的泛化任务上实现了最先进的性能。FraCOs通过识别智能体行为模式,并基于这些模式预期未来效用形成选项,从而实现对新型任务的快速适应。在表格化设置中,FraCOs展现了有效的迁移能力,并随着分层深度的增加而提升性能。我们在多个复杂的程序生成环境中,将FraCOs与最先进的深度强化学习算法进行了对比评估。结果表明,FraCOs在分布内和分布外性能上均优于现有方法。