Can we learn the physics of matter in motion directly from images and video--and trust it? Answering this question requires integrating experiments, physics-based simulation, and data across traditionally separate disciplines. Much of this knowledge is visual and temporal rather than textual: images and videos encode structure, dynamics, and causality that equations alone cannot fully capture. Recent generative models produce compelling visual content, yet they rely on observational data and often lack physical validity. Here we show that generative video models gain scientific value when they couple visual data with experiments and high-fidelity simulations. Using deformation mechanics as a testbed, we study three systems of increasing complexity--rubber compression, can crushing, and cardiac motion--and identify regimes in which visual learning succeeds, fails, and requires mechanistic supervision. When physics manifests in visible kinematics, generative models recover measurable quantities such as surface strain; when internal state variables dominate, visual plausibility no longer ensures physical admissibility. We propose that this convergence defines a new frontier, the Generative Sciences of Matter and Motion, which unifies Simulogenics, Physiogenics, and Materiogenics. These physics-grounded foundation models can turn visual generation into a scientific instrument for inference, prediction, and design of matter in motion.
翻译:我们能直接从图像和视频中学习物质运动的物理规律——并且信任它吗?回答这个问题需要整合实验、基于物理的模拟以及传统上彼此独立学科的数据。这些知识大多是视觉和时序性的,而非文本性的:图像和视频编码了结构、动力学和因果关系,而这些仅靠方程无法完全捕捉。当前生成模型能够产生引人注目的视觉内容,但它们依赖观测数据且往往缺乏物理有效性。本文表明,当生成视频模型将视觉数据与实验和高保真模拟相结合时,就获得了科学价值。以变形力学为试验场,我们研究了三个复杂度递增的系统——橡胶压缩、易拉罐压扁和心脏运动——并识别出视觉学习成功、失败以及需要机制监督的场景。当物理现象表现在可见的运动学中时,生成模型能恢复表面应变等可测量量;当内部状态变量占主导时,视觉合理性不再保证物理可许性。我们提出这种融合定义了一个新前沿——物质与运动的生成科学,它统一了模拟生成学、物理生成学和材料生成学。这些基于物理的基础模型可将视觉生成转变为用于物质运动的推断、预测和设计的科学工具。