Sim2Real transfer has gained popularity because it helps transfer from inexpensive simulators to real world. This paper presents a novel system that fuses components in a traditional \textit{World Model} into a robust system, trained entirely within a simulator, that \textit{Zero-Shot} transfers to the real world. To facilitate transfer, we use an intermediary representation that are based on \textit{Bird's Eye View (BEV)} images. Thus, our robot learns to navigate in a simulator by first learning to translate from complex \textit{First-Person View (FPV)} based RGB images to BEV representations, then learning to navigate using those representations. Later, when tested in the real world, the robot uses the perception model that translates FPV-based RGB images to embeddings that are used by the downstream policy. The incorporation of state-checking modules using \textit{Anchor images} and \textit{Mixture Density LSTM} not only interpolates uncertain and missing observations but also enhances the robustness of the model when exposed to the real-world environment. We trained the model using data collected using a \textit{Differential drive} robot in the CARLA simulator. Our methodology's effectiveness is shown through the deployment of trained models onto a \textit{Real world Differential drive} robot. Lastly we release a comprehensive codebase, dataset and models for training and deployment that are available to the public.
翻译:Sim2Real迁移因有助于从低成本模拟器迁移到现实世界而备受关注。本文提出一种新型系统,将传统《世界模型》中的组件融合为鲁棒系统,该系统完全在模拟器内训练,并实现《零样本》迁移至现实世界。为促进迁移,我们采用基于《鸟瞰视角(BEV)》图像的中间表示。因此,机器人首先学习将基于复杂《第一人称视角(FPV)》的RGB图像转换为BEV表示,然后利用这些表示学习在模拟器中导航。随后在现实世界测试时,机器人使用感知模型将基于FPV的RGB图像转换为嵌入表示,供下游策略使用。通过引入基于《锚定图像》和《混合密度LSTM》的状态检查模块,不仅插值了不确定和缺失的观测值,还增强了模型在现实环境中的鲁棒性。我们利用在CARLA模拟器中通过《差速驱动》机器人收集的数据训练模型。通过将训练模型部署到《现实世界差速驱动》机器人上,证明了方法的有效性。最后,我们公开发布了包含训练与部署的完整代码库、数据集及模型。