Offline reinforcement learning (RL) is a promising direction that allows RL agents to pre-train on large datasets, avoiding the recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1) it permits creating many tasks from few components, 2) the task structure may enable trained agents to solve new tasks by combining relevant learned components, and 3) the compositional dimensions provide a notion of task relatedness. This paper provides four offline RL datasets for simulated robotic manipulation created using the 256 tasks from CompoSuite [Mendez et al., 2022a]. Each dataset is collected from an agent with a different degree of performance, and consists of 256 million transitions. We provide training and evaluation settings for assessing an agent's ability to learn compositional task policies. Our benchmarking experiments on each setting show that current offline RL methods can learn the training tasks to some extent and that compositional methods significantly outperform non-compositional methods. However, current methods are still unable to extract the tasks' compositional structure to generalize to unseen tasks, showing a need for further research in offline compositional RL.
翻译:离线强化学习是一个有前景的方向,它允许强化学习智能体在大规模数据集上预训练,从而避免昂贵的数据重复采集。为推动该领域发展,生成大规模数据集至关重要。组合强化学习在生成此类大规模数据集方面尤其具有吸引力,因为:1)它能够通过少量组件创建大量任务;2)任务结构可能使训练后的智能体通过组合相关已学习组件来解决新任务;3)组合维度提供了任务相关性的概念。本文利用CompoSuite [Mendez et al., 2022a]中的256个任务,创建了四个用于模拟机器人操作的离线强化学习数据集。每个数据集来自具有不同性能水平的智能体,包含2.56亿个转移样本。我们提供了用于评估智能体学习组合任务策略能力的训练和评估设置。我们在每个设置上的基准实验表明,当前离线强化学习方法能在一定程度上学习训练任务,且组合方法显著优于非组合方法。然而,现有方法仍无法提取任务的组合结构以泛化到未知任务,这表明离线组合强化学习领域需要进一步研究。