The rise of generalist large-scale models in natural language and vision has made us expect that a massive data-driven approach could achieve broader generalization in other domains such as continuous control. In this work, we explore a method for learning a single policy that manipulates various forms of agents to solve various tasks by distilling a large amount of proficient behavioral data. In order to align input-output (IO) interface among multiple tasks and diverse agent morphologies while preserving essential 3D geometric relations, we introduce morphology-task graph, which treats observations, actions and goals/task in a unified graph representation. We also develop MxT-Bench for fast large-scale behavior generation, which supports procedural generation of diverse morphology-task combinations with a minimal blueprint and hardware-accelerated simulator. Through efficient representation and architecture selection on MxT-Bench, we find out that a morphology-task graph representation coupled with Transformer architecture improves the multi-task performances compared to other baselines including recent discrete tokenization, and provides better prior knowledge for zero-shot transfer or sample efficiency in downstream multi-task imitation learning. Our work suggests large diverse offline datasets, unified IO representation, and policy representation and architecture selection through supervised learning form a promising approach for studying and advancing morphology-task generalization.
翻译:自然语言与视觉领域通才大规模模型的兴起,使我们期待大规模数据驱动方法能在连续控制等其他领域实现更广泛的泛化。本研究探索了一种通过蒸馏大量熟练行为数据来学习单一策略的方法,该策略可操控多种形态的智能体以解决各类任务。为在保留必要三维几何关系的同时对齐多任务与多样化智能体形态的输入输出接口,我们引入了形态-任务图,将观测、动作与目标/任务统一表示为图结构。我们还开发了用于快速大规模行为生成的MxT-Bench基准平台,该平台通过最小化蓝图与硬件加速模拟器支持多种形态-任务组合的程序化生成。通过在MxT-Bench上进行高效表示与架构选择,我们发现相较于包括最新离散标记化在内的其他基线方法,基于形态-任务图表示配合Transformer架构能提升多任务性能,并为下游多任务模仿学习中的零样本迁移或样本效率提供更好的先验知识。本研究表明,大规模多样化离线数据集、统一输入输出表示,以及通过监督学习进行策略表示与架构选择,是研究与推进形态-任务泛化的有效途径。