PhysInOne: Visual Physics Learning and Reasoning in One Suite

Siyuan Zhou,Hejun Wang,Hu Cheng,Jinxi Li,Dongsheng Wang,Junwei Jiang,Yixiao Jin,Jiayue Huang,Shiwei Mao,Shangjia Liu,Yafei Yang,Hongkang Song,Shenxing Wei,Zihui Zhang,Peng Huang,Shijie Liu,Zhengli Hao,Hao Li,Yitian Li,Wenqi Zhou,Zhihan Zhao,Zongqi He,Hongtao Wen,Shouwang Huang,Peng Yun,Bowen Cheng,Pok Kazaf Fu,Wai Kit Lai,Jiahao Chen,Kaiyuan Wang,Zhixuan Sun,Ziqi Li,Haochen Hu,Di Zhang,Chun Ho Yuen,Bing Wang,Zhihua Wang,Chuhang Zou,Bo Yang

from arxiv, CVPR 2026. Siyuan, Hejun, Hu, Jinxi, Dongsheng, Junwei, Yixiao, Jiayue, and Shiwei are co-first authors. Project page: https://vlar-group.github.io/PhysInOne.html

We present PhysInOne, a large-scale synthetic dataset addressing the critical scarcity of physically-grounded training data for AI systems. Unlike existing datasets limited to merely hundreds or thousands of examples, PhysInOne provides 2 million videos across 153,810 dynamic 3D scenes, covering 71 basic physical phenomena in mechanics, optics, fluid dynamics, and magnetism. Distinct from previous works, our scenes feature multiobject interactions against complex backgrounds, with comprehensive ground-truth annotations including 3D geometry, semantics, dynamic motion, physical properties, and text descriptions. We demonstrate PhysInOne's efficacy across four emerging applications: physics-aware video generation, long-/short-term future frame prediction, physical property estimation, and motion transfer. Experiments show that fine-tuning foundation models on PhysInOne significantly enhances physical plausibility, while also exposing critical gaps in modeling complex physical dynamics and estimating intrinsic properties. As the largest dataset of its kind, orders of magnitude beyond prior works, PhysInOne establishes a new benchmark for advancing physics-grounded world models in generation, simulation, and embodied AI.

翻译：我们提出 PhysInOne——一个大规模合成数据集，旨在解决人工智能系统中物理可解释训练数据严重匮乏的问题。不同于仅包含数百或数千样本的现有数据集，PhysInOne 提供了覆盖力学、光学、流体动力学和磁学中 71 种基本物理现象的 153,810 个动态 3D 场景，包含 200 万个视频。与以往工作不同的是，我们的场景在复杂背景下呈现多物体交互，并配备包括三维几何、语义、动态运动、物理属性及文本描述在内的完整真实标注。我们展示了 PhysInOne 在四个新兴应用中的效能：物理感知视频生成、长期/短期未来帧预测、物理属性估计以及运动迁移。实验表明，在 PhysInOne 上微调基础模型能显著提升物理合理性，同时也揭示了当前模型在复杂物理动力学建模与固有属性估计方面的关键缺陷。作为同类数据集中规模最大（相比先前工作高出数个数量级）的数据集，PhysInOne 为推进生成、仿真与具身智能领域中的物理可解释世界模型建立了新基准。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

物理学中的高级深度学习

专知会员服务

20+阅读 · 2025年12月9日

AI4Physics？【MIT博士论文】探索物理建模与表示学习的交汇点

专知会员服务

29+阅读 · 2025年1月12日

【CVPR2024】PHYSCENE：为体现智能合成的可交互三维场景

专知会员服务

19+阅读 · 2024年4月19日

什么是物理信息强化学习？昆士兰科技大学的等最新《物理信息强化学习》综述，详述PRTL技术方法

专知会员服务

66+阅读 · 2023年9月10日