Generalization of World Models under Environmental Variability for Vision-based Quadrotor Navigation

World models, learned generative models that predict how an environment evolves, have become a promising tool for sample-efficient robot learning. Yet how robust they are to environmental variability remains poorly understood. To address this, we conduct a systematic study using vision-based quadrotor navigation as a testbed problem, training DreamerV3-based world models under varying levels of environmental randomness and evaluating them across all levels through cross-environment validation, spanning both Self-Supervised Learning (SSL) pretraining and Reinforcement Learning (RL) fine-tuning. We then deploy all world models and associated navigation policies on a real quadrotor in unseen environments, including an open-loop run where the model receives just 2.5s of real sensory input before all sensors are cut off, leaving the system to navigate entirely in imagination over a 12m traverse. Our results show that world model robustness during SSL pretraining is a strong predictor of sim-to-real transfer: every model that generalized well in cross-environment SSL validation deployed successfully in the real world, passing through gaps as narrow as 0.67m, whereas the model that dominated simulation policy evaluation failed on the real platform. We further identify (a) the discrete latent size and (b) the training-sequence length as the dominant factors governing world model quality.

翻译：世界模型作为一种学习环境演化预测的生成式模型，已成为提升机器人样本效率的重要工具，但其对环境变化的鲁棒性仍知之甚少。为此，我们以基于视觉的四旋翼无人机导航为测试平台进行系统研究：在多种环境随机性水平下训练基于DreamerV3的世界模型，并通过跨环境验证（涵盖自监督学习预训练与强化学习微调阶段）评估模型在各环境条件下的表现。随后将所有世界模型及其导航策略部署至真实四旋翼平台，在未知环境中开展实验，包含一项开环测试——模型仅接收2.5秒真实传感器输入后即切断所有传感器信号，迫使系统完全依靠想象在12米航程中完成导航。结果表明：自监督学习预训练阶段的世界模型鲁棒性可有效预测仿真到现实的迁移能力——所有在跨环境自监督验证中展现良好泛化能力的模型均成功完成真实部署（可穿越0.67米窄门），而在仿真策略评估中表现最优的模型却未能在真实平台上执行任务。我们进一步发现：(a)离散潜在状态维度与(b)训练序列长度是决定世界模型质量的核心因素。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【综述】世界模型：架构、方法、推理与应用全景

专知会员服务

28+阅读 · 6月2日

《图世界模型：概念、分类体系与未来方向》

专知会员服务

21+阅读 · 5月1日

智能体化世界建模：基础、能力、规律及展望

专知会员服务

23+阅读 · 4月28日

面向无人机应用的 Transformer 与大语言模型最新进展

专知会员服务

32+阅读 · 2025年9月14日