Learning robust and generalizable world models is crucial for enabling efficient and scalable robotic control in real-world environments. In this work, we introduce a novel framework for learning world models that accurately capture complex, partially observable, and stochastic dynamics. The proposed method employs a dual-autoregressive mechanism and self-supervised training to achieve reliable long-horizon predictions without relying on domain-specific inductive biases, ensuring adaptability across diverse robotic tasks. We further propose a policy optimization framework that leverages world models for efficient training in imagined environments and seamless deployment in real-world systems. This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer. By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
翻译:学习稳健且可泛化的世界模型对于在真实世界环境中实现高效且可扩展的机器人控制至关重要。在本研究中,我们引入了一种新颖的框架,用于学习能够准确捕捉复杂、部分可观测且随机动态的世界模型。所提出的方法采用双重自回归机制和自监督训练,在不依赖领域特定归纳偏置的情况下实现可靠的长期预测,确保其能够适应多样化的机器人任务。我们进一步提出了一种策略优化框架,该框架利用世界模型在想象环境中进行高效训练,并在真实世界系统中实现无缝部署。本研究通过解决长期预测、误差累积以及仿真到现实迁移等挑战,推动了基于模型的强化学习的发展。通过提供一个可扩展且稳健的框架,所引入的方法为真实世界应用中的自适应且高效的机器人系统铺平了道路。