We introduce DualMind, a generalist agent designed to tackle various decision-making tasks that addresses challenges posed by current methods, such as overfitting behaviors and dependence on task-specific fine-tuning. DualMind uses a novel "Dual-phase" training strategy that emulates how humans learn to act in the world. The model first learns fundamental common knowledge through a self-supervised objective tailored for control tasks and then learns how to make decisions based on different contexts through imitating behaviors conditioned on given prompts. DualMind can handle tasks across domains, scenes, and embodiments using just a single set of model weights and can execute zero-shot prompting without requiring task-specific fine-tuning. We evaluate DualMind on MetaWorld and Habitat through extensive experiments and demonstrate its superior generalizability compared to previous techniques, outperforming other generalist agents by over 50$\%$ and 70$\%$ on Habitat and MetaWorld, respectively. On the 45 tasks in MetaWorld, DualMind achieves over 30 tasks at a 90$\%$ success rate.
翻译:摘要:本文提出DualMind——一种旨在解决各类决策任务的通用智能体,以应对当前方法过度拟合行为及依赖任务特定微调等挑战。DualMind采用模仿人类行为学习的"双阶段"训练策略:模型首先通过专为控制任务设计的自监督目标学习基础通用知识,随后基于给定提示通过模仿学习不同情境下的决策能力。仅需单组模型权重,DualMind即可处理跨领域、场景及执行器的任务,并实现零样本提示推理,无需任务特定微调。我们在MetaWorld和Habitat平台上开展广泛实验,证明其泛化能力显著优于现有技术:在Habitat和MetaWorld上分别超越其他通用智能体50%和70%以上。在MetaWorld的45项任务中,DualMind以90%的成功率完成超过30项任务。