As autonomous systems are increasingly deployed in open and uncertain settings, there is a growing need for trustworthy world models that can reliably predict future high-dimensional observations. The learned latent representations in world models lack direct mapping to meaningful physical quantities and dynamics, limiting their utility and interpretability in downstream planning, control, and safety verification. In this paper, we argue for a fundamental shift from physically informed to physically interpretable world models - and crystallize four principles that leverage symbolic knowledge to achieve these ends: (1) structuring latent spaces according to the physical intent of variables, (2) learning aligned invariant and equivariant representations of the physical world, (3) adapting training to the varied granularity of supervision signals, and (4) partitioning generative outputs to support scalability and verifiability. We experimentally demonstrate the value of each principle on two benchmarks. This paper opens several intriguing research directions to achieve and capitalize on full physical interpretability in world models.
翻译:随着自主系统越来越多地部署在开放和不确定的环境中,对能够可靠预测未来高维观测值的可信世界模型的需求日益增长。世界模型中学习到的潜在表征缺乏与有意义的物理量和动态的直接映射,限制了其在下游规划、控制和安全性验证中的效用与可解释性。本文主张从物理信息世界模型向物理可解释世界模型进行根本性转变,并具体阐述了利用符号知识实现此目标的四项原则:(1) 根据变量的物理意图构建潜在空间,(2) 学习物理世界对齐的不变与等变表征,(3) 根据监督信号的不同粒度调整训练过程,(4) 划分生成输出以支持可扩展性与可验证性。我们在两个基准测试上通过实验验证了每项原则的价值。本文为在世界模型中实现并充分利用完全物理可解释性开辟了若干引人入胜的研究方向。