LikePhys：通过似然偏好评估视频扩散模型中的直觉物理理解 (LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference)

Intuitive physics understanding in video diffusion models plays an essential role in building general-purpose physically plausible world simulators, yet accurately evaluating such capacity remains a challenging task due to the difficulty in disentangling physics correctness from visual appearance in generation. To the end, we introduce LikePhys, a training-free method that evaluates intuitive physics in video diffusion models by distinguishing physically valid and impossible videos using the denoising objective as an ELBO-based likelihood surrogate on a curated dataset of valid-invalid pairs. By testing on our constructed benchmark of twelve scenarios spanning over four physics domains, we show that our evaluation metric, Plausibility Preference Error (PPE), demonstrates strong alignment with human preference, outperforming state-of-the-art evaluator baselines. We then systematically benchmark intuitive physics understanding in current video diffusion models. Our study further analyses how model design and inference settings affect intuitive physics understanding and highlights domain-specific capacity variations across physical laws. Empirical results show that, despite current models struggling with complex and chaotic dynamics, there is a clear trend of improvement in physics understanding as model capacity and inference settings scale.

翻译：视频扩散模型中的直觉物理理解对于构建通用的物理合理世界模拟器至关重要，然而，由于难以在生成过程中将物理正确性与视觉外观分离，准确评估这种能力仍然是一项具有挑战性的任务。为此，我们提出了LikePhys，这是一种无需训练的方法，通过在精心构建的有效-无效视频对数据集上，使用去噪目标作为基于ELBO的似然代理，来区分物理有效和不可能的视频，从而评估视频扩散模型中的直觉物理理解。通过在涵盖四个物理领域的十二个场景上构建的基准测试，我们表明我们的评估指标——合理性偏好误差（PPE）与人类偏好高度一致，优于最先进的评估器基线。随后，我们系统地评估了当前视频扩散模型的直觉物理理解能力。我们的研究进一步分析了模型设计和推理设置如何影响直觉物理理解，并强调了跨物理定律的领域特定能力差异。实证结果表明，尽管当前模型在处理复杂和混沌动力学方面存在困难，但随着模型能力和推理设置的扩展，物理理解能力呈现出明显的提升趋势。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日