Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.
翻译:从先前记录的数据中学习策略是解决实际机器人任务的一个有前景的方向,因为在线学习往往不可行。特别是灵巧操作在其一般形式上仍然是一个悬而未决的问题。然而,离线强化学习与大规模多样化数据集的结合,有潜力在这一挑战性领域取得突破,类似于近年来监督学习所取得的快速进展。为协调研究社区解决这一问题的努力,我们提出了一个基准测试,包括:i)从灵巧操作平台的两个任务中收集的大规模离线学习数据,这些数据是通过在模拟环境中训练的高效强化学习代理获得的;ii)在真实机器人系统上执行所学策略的选项,以及用于高效调试的模拟环境。我们在这些数据集上评估了主流的开源离线强化学习算法,并为真实系统上的离线强化学习提供了可重复的实验设置。