We propose a hierarchical entity-centric framework for offline Goal-Conditioned Reinforcement Learning (GCRL) that combines subgoal decomposition with factored structure to solve long-horizon tasks in domains with multiple entities. Achieving long-horizon goals in complex environments remains a core challenge in Reinforcement Learning (RL). Domains with multiple entities are particularly difficult due to their combinatorial complexity. GCRL facilitates generalization across goals and the use of subgoal structure, but struggles with high-dimensional observations and combinatorial state-spaces, especially under sparse reward. We employ a two-level hierarchy composed of a value-based GCRL agent and a factored subgoal-generating conditional diffusion model. The RL agent and subgoal generator are trained independently and composed post hoc through selective subgoal generation based on the value function, making the approach modular and compatible with existing GCRL algorithms. We introduce new variations to benchmark tasks that highlight the challenges of multi-entity domains, and show that our method consistently boosts performance of the underlying RL agent on image-based long-horizon tasks with sparse rewards, achieving over 150% higher success rates on the hardest task in our suite and generalizing to increasing horizons and numbers of entities. Rollout videos are provided at: https://sites.google.com/view/hecrl
翻译:我们提出了一种面向离线目标条件强化学习的分层实体中心框架,该框架将子目标分解与因子化结构相结合,以解决多实体领域中的长时程任务。在复杂环境中实现长时程目标仍然是强化学习的核心挑战。多实体领域因其组合复杂性而尤为困难。目标条件强化学习虽能促进跨目标泛化并利用子目标结构,但在高维观测和组合状态空间(尤其是稀疏奖励条件下)仍面临困难。我们采用了一个双层架构,包含基于价值的目标条件强化学习智能体和一个因子化子目标生成条件扩散模型。强化学习智能体与子目标生成器独立训练,并通过基于价值函数的选择性子目标生成进行事后组合,使该方法具有模块化特性且兼容现有目标条件强化学习算法。我们引入了基准任务的新变体以突显多实体领域的挑战,并证明我们的方法在基于图像、具有稀疏奖励的长时程任务中,能持续提升底层强化学习智能体的性能——在测试集中最困难任务上实现了超过150%的成功率提升,并能泛化至更长的时程和更多的实体数量。任务演示视频详见:https://sites.google.com/view/hecrl