Sim2Real transfer has gained popularity because it helps transfer from inexpensive simulators to real world. This paper presents a novel system that fuses components in a traditional World Model into a robust system, trained entirely within a simulator, that Zero-Shot transfers to the real world. To facilitate transfer, we use an intermediary representation that is based on \textit{Bird's Eye View (BEV)} images. Thus, our robot learns to navigate in a simulator by first learning to translate from complex \textit{First-Person View (FPV)} based RGB images to BEV representations, then learning to navigate using those representations. Later, when tested in the real world, the robot uses the perception model that translates FPV-based RGB images to embeddings that were learned by the FPV to BEV translator and that can be used by the downstream policy. The incorporation of state-checking modules using \textit{Anchor images} and Mixture Density LSTM not only interpolates uncertain and missing observations but also enhances the robustness of the model in the real-world. We trained the model using data from a Differential drive robot in the CARLA simulator. Our methodology's effectiveness is shown through the deployment of trained models onto a real-world Differential drive robot. Lastly we release a comprehensive codebase, dataset and models for training and deployment (\url{https://sites.google.com/view/value-explicit-pretraining}).
翻译:Sim2Real迁移由于能将低成本模拟器中的成果迁移至现实世界而日益普及。本文提出一种创新系统,将传统世界模型中的组件融合为鲁棒系统,完全在模拟器中训练并实现零样本迁移至真实环境。为促进迁移,我们采用基于鸟瞰图(BEV)图像的中间表征。因此,机器人首先学习将复杂的第一人称视角(FPV)RGB图像翻译为BEV表征,再利用这些表征在模拟器中学习导航。在真实环境测试时,机器人通过感知模型将FPV-RGB图像转换为由FPV到BEV翻译器习得的嵌入向量,供下游策略使用。通过引入基于锚点图像和混合密度LSTM的状态检查模块,不仅有效插补了不确定及缺失的观测数据,还增强了模型在真实世界中的鲁棒性。我们在CARLA模拟器中利用差动驱动机器人数据训练模型,并通过将训练模型部署至真实差动驱动机器人验证了方法的有效性。最后,我们开源了完整的代码库、数据集及训练部署模型(\url{https://sites.google.com/view/value-explicit-pretraining})。