Training general robotic policies from heterogeneous data for different tasks is a significant challenge. Existing robotic datasets vary in different modalities such as color, depth, tactile, and proprioceptive information, and collected in different domains such as simulation, real robots, and human videos. Current methods usually collect and pool all data from one domain to train a single policy to handle such heterogeneity in tasks and domains, which is prohibitively expensive and difficult. In this work, we present a flexible approach, dubbed Policy Composition, to combine information across such diverse modalities and domains for learning scene-level and task-level generalized manipulation skills, by composing different data distributions represented with diffusion models. Our method can use task-level composition for multi-task manipulation and be composed with analytic cost functions to adapt policy behaviors at inference time. We train our method on simulation, human, and real robot data and evaluate in tool-use tasks. The composed policy achieves robust and dexterous performance under varying scenes and tasks and outperforms baselines from a single data source in both simulation and real-world experiments. See https://liruiw.github.io/policycomp for more details .
翻译:从异构数据中训练用于不同任务的通用机器人策略是一项重大挑战。现有机器人数据集在颜色、深度、触觉、本体感知等模态上存在差异,且收集自仿真、真实机器人和人类视频等不同领域。当前方法通常收集并整合某一领域的全部数据以训练单一策略处理任务和领域的异构性,但这种方法成本高昂且困难重重。本文提出一种名为策略组合(Policy Composition)的灵活方法,通过组合由扩散模型表示的不同数据分布,融合跨多种模态和领域的信息,学习场景级和任务级泛化的操作技能。该方法可实现多任务操作中的任务级组合,并能与分析代价函数结合,在推理阶段调整策略行为。我们在仿真、人类和真实机器人数据上训练该方法,并在工具使用任务中评估。组合策略在不同场景和任务下展现出稳健且灵巧的性能,在仿真和真实世界实验中均优于基于单一数据源的基线方法。详情参见 https://liruiw.github.io/policycomp 。