Accurately simulating existing 3D objects and a wide variety of materials often demands expert knowledge and time-consuming physical parameter tuning to achieve the desired dynamic behavior. We introduce MotionPhysics, an end-to-end differentiable framework that infers plausible physical parameters from a user-provided natural language prompt for a chosen 3D scene of interest, removing the need for guidance from ground-truth trajectories or annotated videos. Our approach first utilizes a multimodal large language model to estimate material parameter values, which are constrained to lie within plausible ranges. We further propose a learnable motion distillation loss that extracts robust motion priors from pretrained video diffusion models while minimizing appearance and geometry inductive biases to guide the simulation. We evaluate MotionPhysics across more than thirty scenarios, including real-world, human-designed, and AI-generated 3D objects, spanning a wide range of materials such as elastic solids, metals, foams, sand, and both Newtonian and non-Newtonian fluids. We demonstrate that MotionPhysics produces visually realistic dynamic simulations guided by natural language, surpassing the state of the art while automatically determining physically plausible parameters. The code and project page are available at: https://wangmiaowei.github.io/MotionPhysics.github.io/.
翻译:准确模拟现有三维物体及多种材质通常需要专业知识与耗时的物理参数调整才能实现预期的动态行为。本文提出MotionPhysics——一种端到端可微分框架,能够根据用户提供的自然语言提示为选定三维场景推断合理的物理参数,无需依赖真实轨迹或标注视频的指导。该方法首先利用多模态大语言模型估计材料参数值,并通过约束确保其处于合理范围内。我们进一步提出可学习的运动蒸馏损失函数,从预训练视频扩散模型中提取鲁棒的运动先验,同时最小化外观与几何的归纳偏置以引导模拟过程。我们在超过三十种场景中对MotionPhysics进行评估,涵盖真实世界、人工设计及AI生成的三维物体,涉及弹性固体、金属、泡沫、沙粒以及牛顿与非牛顿流体等多种材料。实验表明,MotionPhysics能够通过自然语言引导生成视觉逼真的动态模拟,在自动确定物理合理参数的同时超越了现有技术水平。代码与项目页面详见:https://wangmiaowei.github.io/MotionPhysics.github.io/。