Fleets of robots ingest massive amounts of heterogeneous streaming data silos generated by interacting with their environments, far more than what can be stored or transmitted with ease. At the same time, teams of robots should co-acquire diverse skills through their heterogeneous experiences in varied settings. How can we enable such fleet-level learning without having to transmit or centralize fleet-scale data? In this paper, we investigate policy merging (PoMe) from such distributed heterogeneous datasets as a potential solution. To efficiently merge policies in the fleet setting, we propose FLEET-MERGE, an instantiation of distributed learning that accounts for the permutation invariance that arises when parameterizing the control policies with recurrent neural networks. We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment, with good performance on nearly all training tasks at test time. Moreover, we introduce a novel robotic tool-use benchmark, FLEET-TOOLS, for fleet policy learning in compositional and contact-rich robot manipulation tasks, to validate the efficacy of FLEET-MERGE on the benchmark.
翻译:机群机器人从与环境的交互中产生大量异构的流式数据孤岛,其规模远超常规存储或传输能力。同时,机器人团队需要通过不同场景下的异构经验协同获取多样化技能。如何在无需传输或集中处理机群规模数据的前提下实现此类机群级学习?本文研究了从分布式异构数据集中进行策略合并的潜在解决方案。针对机群场景下的高效策略合并,我们提出FLEET-MERGE——一种考虑循环神经网络参数化控制策略时产生的排列不变性的分布式学习具现化方案。实验表明,FLEET-MERGE能够整合在Meta-World环境中50个任务上训练得到的策略行为,在测试阶段几乎所有训练任务上均表现出色。此外,我们引入了新型机器人工具使用基准测试FLEET-TOOLS,用于组合式且富含接触的机器人操作任务中的机群策略学习,验证了FLEET-MERGE在该基准测试上的有效性。