We introduce DualMind, a generalist agent designed to tackle various decision-making tasks that addresses challenges posed by current methods, such as overfitting behaviors and dependence on task-specific fine-tuning. DualMind uses a novel "Dual-phase" training strategy that emulates how humans learn to act in the world. The model first learns fundamental common knowledge through a self-supervised objective tailored for control tasks and then learns how to make decisions based on different contexts through imitating behaviors conditioned on given prompts. DualMind can handle tasks across domains, scenes, and embodiments using just a single set of model weights and can execute zero-shot prompting without requiring task-specific fine-tuning. We evaluate DualMind on MetaWorld and Habitat through extensive experiments and demonstrate its superior generalizability compared to previous techniques, outperforming other generalist agents by over 50$\%$ and 70$\%$ on Habitat and MetaWorld, respectively. On the 45 tasks in MetaWorld, DualMind achieves over 30 tasks at a 90$\%$ success rate.
翻译:我们提出DualMind——一种通用智能体,旨在应对现有方法面临的挑战(如行为过拟合与依赖任务特定微调),以解决各类决策任务。DualMind采用新颖的"双阶段"训练策略,模拟人类在现实世界中的行为学习过程:首先通过专为控制任务设计的自监督目标习得基础通用知识,随后基于给定提示的条件化行为模仿学习不同情境下的决策能力。DualMind仅需单组模型权重即可处理跨领域、跨场景及不同具身形态的任务,并能在无需任务特定微调的情况下执行零样本提示推理。通过在MetaWorld和Habitat平台上的大量实验评估,DualMind展现出优于先前技术的泛化能力——在Habitat和MetaWorld上分别超越其他通用智能体超过50%和70%。在MetaWorld的45项任务中,DualMind以90%的成功率完成了超过30项任务。