LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)

Wei Luo,Yiting Lu,Xin Li,Haoran Li,Fengbin Guan,Chen Gao,Xin Jin,Yong Li,Zhibo Chen,Sijing Wu,Kang Fu,Yunhao Li,Ziang Xiao,Huiyu Duan,Jing Liu,Qiang Hu,Xiongkuo Min,Guangtao Zhai,Manxi Sun,Zixuan Guo,Yun Li,Ziyang Chen,Manabu Tsukada,Zhengyang Li,Zhenglin Du,Yi Wen,Licheng Jiao,Fang Liu,Lingling Li,Yiwen Ren,Zhilong Song,Dubing Chen,Yucheng Zhou,Tianyi Yan,Huan Zheng

This paper reports on the LoViF 2026 PhyScore challenge, a competition on holistic quality assessment of world-model-generated videos across both 2D and 4D generation settings. The challenge is motivated by a central gap in current evaluation practice: perceptual quality alone is insufficient to judge whether generated dynamics are physically plausible, temporally coherent, and consistent with input conditions. Participants are required to build a metric that jointly predicts four dimensions, i.e., Video Quality, Physical Realism, Condition-Video Alignment, and Temporal Consistency. Depart from that, participants also need to localize physical anomaly timestamps for fine-grained diagnosis. The benchmark dataset contains 1,554 videos generated by seven representative world generative models, organized into three tracks (text-2D, image-to-4D, and video-to-4D) and spanning 26 categories. These categories explicitly cover physics-relevant scenarios, including dynamics, optics, and thermodynamics, together with diverse real-world and creative content. To ensure label reliability, scores and anomaly timestamps are produced through trained human annotation with an additional automated quality-control pass. Evaluation is based on both score prediction and anomaly localization, with a composite protocol that combines TimeStamp_IOU and SRCC/PLCC. This report summarizes the challenge design and provides method-level insights from submitted solutions.

翻译：本文报告了LoViF 2026 PhyScore挑战赛，该竞赛旨在对基于世界模型生成的视频在2D和4D生成设置下进行全维度质量评估。该挑战源于当前评估实践中的一个核心空白：仅凭感知质量不足以判断生成动态是否具备物理合理性、时间连贯性以及与输入条件的一致性。参赛者需构建一个能联合预测四个维度的指标，即视频质量、物理真实性、条件-视频对齐度和时间一致性。此外，参赛者还需定位物理异常时间戳以实现细粒度诊断。基准数据集包含由七个代表性世界生成模型生成的1,554个视频，划分为三条赛道（文本转2D、图像转4D、视频转4D），涵盖26个类别。这些类别明确包含涉及物理学的场景，包括动力学、光学和热力学，以及多样的真实世界与创意内容。为确保标注可靠性，评分和异常时间戳通过经过培训的人工标注并结合额外的自动化质量控制流程生成。评估基于评分预测与异常定位两部分，采用结合TimeStamp_IOU与SRCC/PLCC的复合评估协议。本报告总结了挑战设计，并从提交的解决方案中提炼出方法层面的见解。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

3+阅读 · 6月17日

CVPR 2026趋势报告：视觉AI正在走向世界模型与物理智能，165页ppt

专知会员服务

30+阅读 · 6月12日

智能体化世界建模：基础、能力、规律及展望

专知会员服务

23+阅读 · 4月28日

三维与四维世界建模综述

专知会员服务

30+阅读 · 2025年9月12日