WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making

World models play a crucial role in decision-making within embodied environments, enabling cost-free explorations that would otherwise be expensive in the real world. To facilitate effective decision-making, world models must be equipped with strong generalizability to support faithful imagination in out-of-distribution (OOD) regions and provide reliable uncertainty estimation to assess the credibility of the simulated experiences, both of which present significant challenges for prior scalable approaches. This paper introduces WHALE, a framework for learning generalizable world models, consisting of two key techniques: behavior-conditioning and retracing-rollout. Behavior-conditioning addresses the policy distribution shift, one of the primary sources of the world model generalization error, while retracing-rollout enables efficient uncertainty estimation without the necessity of model ensembles. These techniques are universal and can be combined with any neural network architecture for world model learning. Incorporating these two techniques, we present Whale-ST, a scalable spatial-temporal transformer-based world model with enhanced generalizability. We demonstrate the superiority of Whale-ST in simulation tasks by evaluating both value estimation accuracy and video generation fidelity. Additionally, we examine the effectiveness of our uncertainty estimation technique, which enhances model-based policy optimization in fully offline scenarios. Furthermore, we propose Whale-X, a 414M parameter world model trained on 970K trajectories from Open X-Embodiment datasets. We show that Whale-X exhibits promising scalability and strong generalizability in real-world manipulation scenarios using minimal demonstrations.

翻译：世界模型在具身环境决策中发挥着关键作用，能够实现现实世界中代价高昂的免费探索。为促进有效决策，世界模型需具备强大的泛化能力以支持分布外区域的可靠想象，并提供可靠的不确定性估计以评估模拟经验的可信度，这两者对先前的可扩展方法均构成重大挑战。本文提出WHALE——一个学习通用世界模型的框架，包含两项关键技术：行为条件化与回溯推演。行为条件化解决了策略分布偏移这一世界模型泛化误差的主要来源，而回溯推演则无需模型集成即可实现高效的不确定性估计。这些技术具有普适性，可与任何神经网络架构结合用于世界模型学习。基于这两项技术，我们提出了Whale-ST——一个具备增强泛化能力的可扩展时空Transformer世界模型。我们通过评估价值估计精度与视频生成保真度，在仿真任务中验证了Whale-ST的优越性。此外，我们检验了不确定性估计技术的有效性，该技术能增强完全离线场景下的基于模型的策略优化。进一步地，我们提出了Whale-X——一个基于Open X-Embodiment数据集中97万条轨迹训练的4.14亿参数世界模型。实验表明，Whale-X在现实世界操控场景中展现出良好的可扩展性和强大的泛化能力，仅需少量演示即可实现优异性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日