Scale buys interpolation; structure buys a certified horizon. A world model's average error says nothing about whether a particular prediction can be trusted, or for how long. For equivariant latent world models we give a computable, multi-step certificate of the predictable horizon: $T$-step rollout error is provably constant over each symmetry orbit (Theorem A) and stratified channel-by-channel by the predictor's Lyapunov spectrum, $T_j(ε)\sim\log(1/ε)/λ_j$. The horizon is two-sided -- a matching lower bound makes approximate equivariance provably horizon-limited -- and the certificate is exclusive to structure: orbit-constant error characterizes equivariance, so no non-equivariant model has it at any scale. Empirically, on 40-D Lorenz-96 only a $\mathbb{Z}_N$-equivariant network recovers the full Lyapunov spectrum ($R^2{=}0.98$); dense and recurrent baselines fail. Because the spectrum is faithful, the certificate acts, a priori: under a fixed sensing budget a $c\times$-inflated certificate provably needs $c\times$ the budget, and the equivariant certificate meets a budget its inflated dense counterpart cannot -- with zero calibration data. The same read-out, unchanged, audits public pretrained world models training-free: TD-MPC2 checkpoints land on the certificate's own scope taxonomy -- calibrated where strongly expansive (ratio 0.94-1.02), optimistic where weakly expansive, correctly abstaining where contracting -- a map a deployed monitor replicates cell-by-cell, out-of-sample. Across the official 1M-317M multitask ladder, calibration does not improve with parameters. On V-JEPA 2-AC (1B, real robot data) the measured cross-check correctly overrides an over-promising tangent spectrum -- the cross-validated audit, not the raw number, is the deployable object. Scale buys interpolation, not a calibrated horizon.
翻译:尺度换插值,结构换认证视界。世界模型的平均误差无法说明某一特定预测是否可信,或其可信时长。针对等变潜世界模型,我们给出了一种可计算、多步骤的可预测视界认证:$T$步展开误差在每个对称轨道上被证明是恒定的(定理A),并按通道分层由预测器的李雅普诺夫谱刻画,$T_j(ε)\sim\log(1/ε)/λ_j$。该视界是双向的——匹配的下界使近似等变被证明是视界受限的——且该认证为结构所独有:轨道恒定误差刻画了等变性,因此任何非等变模型无论尺度大小都不具备该性质。实验上,在40维Lorenz-96系统中,仅$\mathbb{Z}_N$等变网络能完整恢复李雅普诺夫谱($R^2=0.98$),而密集层和循环基线均失败。由于谱的保真性,该认证可先验地起作用:在固定感知预算下,一个放大$c$倍的认证被证明需要$c$倍的预算,而等变认证满足了其放大后的密集对应模型无法满足的预算——且无需任何校准数据。相同的读取器,未经改动,可零训练地审计公开的预训练世界模型:TD-MPC2检查点落入了认证自身的范围分类法——在强扩张区域被校准(比率0.94-1.02),在弱扩张区域偏乐观,在收缩区域正确弃权——这一映射图由部署的监控器逐单元独立复制。在官方的1M-317M多任务阶梯上,校准度并不随参数增加而改善。在V-JEPA 2-AC(1B参数,真实机器人数据)上,实测交叉校验正确覆盖了过度承诺的切谱——交叉验证的审计结果,而非原始数值,才是可部署的对象。尺度换插值,而非校准视界。