So You Think You Can Scale Up Autonomous Robot Data Collection?

A long-standing goal in robot learning is to develop methods for robots to acquire new skills autonomously. While reinforcement learning (RL) comes with the promise of enabling autonomous data collection, it remains challenging to scale in the real-world partly due to the significant effort required for environment design and instrumentation, including the need for designing reset functions or accurate success detectors. On the other hand, imitation learning (IL) methods require little to no environment design effort, but instead require significant human supervision in the form of collected demonstrations. To address these shortcomings, recent works in autonomous IL start with an initial seed dataset of human demonstrations that an autonomous policy can bootstrap from. While autonomous IL approaches come with the promise of addressing the challenges of autonomous RL as well as pure IL strategies, in this work, we posit that such techniques do not deliver on this promise and are still unable to scale up autonomous data collection in the real world. Through a series of real-world experiments, we demonstrate that these approaches, when scaled up to realistic settings, face much of the same scaling challenges as prior attempts in RL in terms of environment design. Further, we perform a rigorous study of autonomous IL methods across different data scales and 7 simulation and real-world tasks, and demonstrate that while autonomous data collection can modestly improve performance, simply collecting more human data often provides significantly more improvement. Our work suggests a negative result: that scaling up autonomous data collection for learning robot policies for real-world tasks is more challenging and impractical than what is suggested in prior work. We hope these insights about the core challenges of scaling up data collection help inform future efforts in autonomous learning.

翻译：机器人学习领域的一个长期目标是开发使机器人能够自主获取新技能的方法。尽管强化学习（RL）有望实现自主数据收集，但在现实世界中扩展仍然具有挑战性，部分原因是环境设计和仪器化需要大量工作，包括设计重置函数或精确的成功检测器。另一方面，模仿学习（IL）方法几乎不需要环境设计工作，但需要大量人类监督，以收集演示的形式进行。为了解决这些缺点，自主IL的最新研究从人类演示的初始种子数据集开始，自主策略可以从中进行引导。尽管自主IL方法有望解决自主RL以及纯IL策略的挑战，但在这项工作中，我们认为这些技术并未兑现这一承诺，并且仍然无法在现实世界中扩展自主数据收集。通过一系列现实世界实验，我们证明这些方法在扩展到现实环境时，面临与先前RL尝试中许多相同的扩展挑战，尤其是在环境设计方面。此外，我们对自主IL方法在不同数据规模和7个模拟及现实任务中进行了严格研究，结果表明，虽然自主数据收集可以适度提高性能，但仅仅收集更多人类数据通常能带来显著更大的改进。我们的工作提出了一个负面结果：为学习现实世界任务的机器人策略而扩展自主数据收集，比先前工作中所暗示的更具挑战性且不切实际。我们希望这些关于扩展数据收集核心挑战的见解，能为未来自主学习的努力提供参考。