Determining whether neural models internalize physical laws as world models, rather than exploiting statistical shortcuts, remains challenging, especially under out-of-distribution (OOD) shifts. Standard evaluations often test latent capability via downstream adaptation (e.g., fine-tuning or high-capacity probes), but such interventions can change the representations being measured and thus confound what was learned during self-supervised learning (SSL). We propose a non-invasive evaluation protocol, PhyIP. We test whether physical quantities are linearly decodable from frozen representations, motivated by the linear representation hypothesis. Across fluid dynamics and orbital mechanics, we find that when SSL achieves low error, latent structure becomes linearly accessible. PhyIP recovers internal energy and Newtonian inverse-square scaling on OOD tests (e.g., $ρ> 0.90$). In contrast, adaptation-based evaluations can collapse this structure ($ρ\approx 0.05$). These findings suggest that adaptation-based evaluation can obscure latent structures and that low-capacity probes offer a more accurate evaluation of physical world models.
翻译:确定神经网络模型是否将物理定律内化为世界模型,而非利用统计捷径,仍然具有挑战性,尤其是在分布外(OOD)偏移下。标准评估通常通过下游适应(例如微调或高容量探针)来测试潜在能力,但此类干预可能会改变被测量的表示,从而混淆在自监督学习(SSL)期间所学到的内容。我们提出了一种非侵入式评估协议 PhyIP。我们基于线性表示假说,测试物理量是否可以从冻结表示中线性解码。在流体动力学和轨道力学中,我们发现当 SSL 实现低误差时,潜在结构变得线性可访问。PhyIP 在 OOD 测试中恢复了内能和牛顿反平方标度(例如,$ρ> 0.90$)。相比之下,基于适应的评估可能会破坏这种结构($ρ\approx 0.05$)。这些发现表明,基于适应的评估可能掩盖潜在结构,而低容量探针为物理世界模型提供了更准确的评估。