World models, learned generative models that predict how an environment evolves, have become a promising tool for sample-efficient robot learning. Yet how robust they are to environmental variability remains poorly understood. To address this, we conduct a systematic study using vision-based quadrotor navigation as a testbed problem, training DreamerV3-based world models under varying levels of environmental randomness and evaluating them across all levels through cross-environment validation, spanning both Self-Supervised Learning (SSL) pretraining and Reinforcement Learning (RL) fine-tuning. We then deploy all world models and associated navigation policies on a real quadrotor in unseen environments, including an open-loop run where the model receives just 2.5s of real sensory input before all sensors are cut off, leaving the system to navigate entirely in imagination over a 12m traverse. Our results show that world model robustness during SSL pretraining is a strong predictor of sim-to-real transfer: every model that generalized well in cross-environment SSL validation deployed successfully in the real world, passing through gaps as narrow as 0.67m, whereas the model that dominated simulation policy evaluation failed on the real platform. We further identify (a) the discrete latent size and (b) the training-sequence length as the dominant factors governing world model quality.
翻译:世界模型作为一种学习环境演化预测的生成式模型,已成为提升机器人样本效率的重要工具,但其对环境变化的鲁棒性仍知之甚少。为此,我们以基于视觉的四旋翼无人机导航为测试平台进行系统研究:在多种环境随机性水平下训练基于DreamerV3的世界模型,并通过跨环境验证(涵盖自监督学习预训练与强化学习微调阶段)评估模型在各环境条件下的表现。随后将所有世界模型及其导航策略部署至真实四旋翼平台,在未知环境中开展实验,包含一项开环测试——模型仅接收2.5秒真实传感器输入后即切断所有传感器信号,迫使系统完全依靠想象在12米航程中完成导航。结果表明:自监督学习预训练阶段的世界模型鲁棒性可有效预测仿真到现实的迁移能力——所有在跨环境自监督验证中展现良好泛化能力的模型均成功完成真实部署(可穿越0.67米窄门),而在仿真策略评估中表现最优的模型却未能在真实平台上执行任务。我们进一步发现:(a)离散潜在状态维度与(b)训练序列长度是决定世界模型质量的核心因素。