Developing world models that understand complex physical interactions is essential for advancing robotic planning and simulation.However, existing methods often struggle to accurately model the environment under conditions of data scarcity and complex contact-rich dynamic motion.To address these challenges, we propose ContactGaussian-WM, a differentiable physics-grounded rigid-body world model capable of learning intricate physical laws directly from sparse and contact-rich video sequences.Our framework consists of two core components: (1) a unified Gaussian representation for both visual appearance and collision geometry, and (2) an end-to-end differentiable learning framework that differentiates through a closed-form physics engine to infer physical properties from sparse visual observations.Extensive simulations and real-world evaluations demonstrate that ContactGaussian-WM outperforms state-of-the-art methods in learning complex scenarios, exhibiting robust generalization capabilities.Furthermore, we showcase the practical utility of our framework in downstream applications, including data synthesis and real-time MPC.
翻译:开发能够理解复杂物理交互的世界模型对于推进机器人规划与仿真至关重要。然而,现有方法在数据稀缺和接触丰富的复杂动态运动条件下,往往难以准确建模环境。为应对这些挑战,我们提出了ContactGaussian-WM,一种可微分的、基于物理的刚体世界模型,能够直接从稀疏且接触丰富的视频序列中学习复杂的物理规律。我们的框架包含两个核心组件:(1) 用于视觉外观与碰撞几何的统一高斯表示,以及 (2) 一个端到端的可微分学习框架,该框架通过一个闭式物理引擎进行微分,以从稀疏视觉观测中推断物理属性。大量的仿真与真实世界评估表明,ContactGaussian-WM在学习复杂场景方面优于现有最先进方法,并展现出强大的泛化能力。此外,我们展示了该框架在下游应用中的实际效用,包括数据合成与实时模型预测控制。