We present PhysInOne, a large-scale synthetic dataset addressing the critical scarcity of physically-grounded training data for AI systems. Unlike existing datasets limited to merely hundreds or thousands of examples, PhysInOne provides 2 million videos across 153,810 dynamic 3D scenes, covering 71 basic physical phenomena in mechanics, optics, fluid dynamics, and magnetism. Distinct from previous works, our scenes feature multiobject interactions against complex backgrounds, with comprehensive ground-truth annotations including 3D geometry, semantics, dynamic motion, physical properties, and text descriptions. We demonstrate PhysInOne's efficacy across four emerging applications: physics-aware video generation, long-/short-term future frame prediction, physical property estimation, and motion transfer. Experiments show that fine-tuning foundation models on PhysInOne significantly enhances physical plausibility, while also exposing critical gaps in modeling complex physical dynamics and estimating intrinsic properties. As the largest dataset of its kind, orders of magnitude beyond prior works, PhysInOne establishes a new benchmark for advancing physics-grounded world models in generation, simulation, and embodied AI.
翻译:我们提出 PhysInOne——一个大规模合成数据集,旨在解决人工智能系统中物理可解释训练数据严重匮乏的问题。不同于仅包含数百或数千样本的现有数据集,PhysInOne 提供了覆盖力学、光学、流体动力学和磁学中 71 种基本物理现象的 153,810 个动态 3D 场景,包含 200 万个视频。与以往工作不同的是,我们的场景在复杂背景下呈现多物体交互,并配备包括三维几何、语义、动态运动、物理属性及文本描述在内的完整真实标注。我们展示了 PhysInOne 在四个新兴应用中的效能:物理感知视频生成、长期/短期未来帧预测、物理属性估计以及运动迁移。实验表明,在 PhysInOne 上微调基础模型能显著提升物理合理性,同时也揭示了当前模型在复杂物理动力学建模与固有属性估计方面的关键缺陷。作为同类数据集中规模最大(相比先前工作高出数个数量级)的数据集,PhysInOne 为推进生成、仿真与具身智能领域中的物理可解释世界模型建立了新基准。