Interactive world models that simulate object dynamics are crucial for robotics, VR, and AR. However, it remains a significant challenge to learn physics-consistent dynamics models from limited real-world video data, especially for deformable objects with spatially-varying physical properties. To overcome the challenge of data scarcity, we propose PhysWorld, a novel framework that utilizes a simulator to synthesize physically plausible and diverse demonstrations to learn efficient world models. Specifically, we first construct a physics-consistent digital twin within MPM simulator via constitutive model selection and global-to-local optimization of physical properties. Subsequently, we apply part-aware perturbations to the physical properties and generate various motion patterns for the digital twin, synthesizing extensive and diverse demonstrations. Finally, using these demonstrations, we train a lightweight GNN-based world model that is embedded with physical properties. The real video can be used to further refine the physical properties. PhysWorld achieves accurate and fast future predictions for various deformable objects, and also generalizes well to novel interactions. Experiments show that PhysWorld has competitive performance while enabling inference speeds 47 times faster than the recent state-of-the-art method, i.e., PhysTwin.
翻译:交互式世界模型能够模拟物体动力学,对机器人学、虚拟现实和增强现实至关重要。然而,从有限的真实世界视频数据中学习物理一致的动力学模型仍然是一个重大挑战,特别是对于具有空间变化物理属性的可变形物体。为克服数据稀缺的挑战,我们提出了PhysWorld,一个新颖的框架,它利用模拟器合成物理合理且多样化的演示,以学习高效的世界模型。具体而言,我们首先通过本构模型选择和物理属性的全局到局部优化,在MPM模拟器中构建一个物理一致的数字孪生体。随后,我们对物理属性施加部件感知扰动,并为数字孪生体生成各种运动模式,从而合成大量且多样化的演示。最后,利用这些演示,我们训练了一个嵌入物理属性的轻量级基于GNN的世界模型。真实视频可用于进一步细化物理属性。PhysWorld能够对各种可变形物体进行准确且快速的未来预测,并且能很好地泛化到新的交互场景。实验表明,PhysWorld具有有竞争力的性能,同时其推理速度比当前最先进的方法(即PhysTwin)快47倍。