By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer1.github.io
翻译:通过从大规模、多样化的任务无关数据集中迁移知识,现代机器学习模型能够以零样本方式或利用少量任务特定数据集,高效解决特定下游任务。尽管这一能力已在计算机视觉、自然语言处理或语音识别等领域得到验证,但在机器人领域仍待实现——由于收集真实世界机器人数据的难度,模型的泛化能力在此尤为关键。我们认为,此类通用机器人模型成功的关键之一在于结合开放式任务无关训练与能够吸收多样化机器人数据的高容量架构。本文提出一类名为"机器人Transformer"的模型,展现出极具前景的可扩展模型特性。我们基于大规模真实机器人执行实际任务的实机数据采集,通过研究不同模型类别及其随数据量、模型规模与数据多样性变化的泛化能力,验证了上述结论。项目网站与视频见robotics-transformer1.github.io