Generalist robot manipulation policies (GMPs) have the potential to generalize across a wide range of tasks, devices, and environments. However, existing policies continue to struggle with out-of-distribution scenarios due to the inherent difficulty of collecting sufficient action data to cover extensively diverse domains. While fine-tuning offers a practical way to quickly adapt a GMPs to novel domains and tasks with limited samples, we observe that the performance of the resulting GMPs differs significantly with respect to the design choices of fine-tuning strategies. In this work, we first conduct an in-depth empirical study to investigate the effect of key factors in GMPs fine-tuning strategies, covering the action space, policy head, supervision signal and the choice of tunable parameters, where 2,500 rollouts are evaluated for a single configuration. We systematically discuss and summarize our findings and identify the key design choices, which we believe give a practical guideline for GMPs fine-tuning. We observe that in a low-data regime, with carefully chosen fine-tuning strategies, a GMPs significantly outperforms the state-of-the-art imitation learning algorithms. The results presented in this work establish a new baseline for future studies on fine-tuned GMPs, and provide a significant addition to the GMPs toolbox for the community.
翻译:通用机器人操作策略(GMPs)具备在广泛任务、设备和环境中泛化的潜力。然而,由于收集足够覆盖高度多样化领域的动作数据存在固有困难,现有策略在处理分布外场景时仍面临挑战。尽管微调提供了一种利用有限样本快速使GMPs适应新领域和任务的实用方法,但我们观察到,所得GMPs的性能因微调策略的设计选择而存在显著差异。在本工作中,我们首先开展了一项深入的实证研究,以探究GMPs微调策略中关键因素的影响,涵盖动作空间、策略头、监督信号及可调参数的选择,其中单个配置评估了2,500次运行。我们系统地讨论并总结了研究发现,识别出关键设计选择,我们相信这为GMPs微调提供了实用指南。我们观察到,在低数据条件下,通过精心选择的微调策略,GMPs显著优于当前最先进的模仿学习算法。本工作呈现的结果为未来关于微调GMPs的研究建立了新基线,并为学界提供了GMPs工具箱的重要补充。