JEPA-family world models use a static predictor whose weights do not adapt when test-time dynamics diverge from training. We compare two mechanisms for incorporating accumulated experience into a JEPA predictor under distribution shift: operand-side injection, where a compressed experience representation is added as a residual to the predictor's hidden state (EI-JEPA), and operator-side modulation, where the same representation generates low-rank weight deltas via LoRA applied to the predictor's weights (EPM-JEPA). On a pre-registered comparison (Moving MNIST, gravity shift), EPM-JEPA (D_shift^{n=50} = 0.7848 +/- 0.0078, three seeds) differs from EI-JEPA (0.8238) by delta = 4.74% - Outcome C: a null result - by our stated criterion, a valid outcome. As a secondary, non-pre-registered observation, EPM-JEPA improves 1.90% over a no-memory baseline (0.8000), consistently across seeds, while EI-JEPA underperforms the baseline, indicating the benefit is specific to weight-level modulation. Our primary contribution is a mechanism analysis: the D_shift^{n=50} trajectory reflects three independent dynamical processes - buffer cycling, EMA target drift, and an intrinsic LoRA settling transient of +0.021 - rather than convergence to equilibrium. These findings motivate PEM-JEPA, a physics-grounded successor addressing this dynamical-peak limitation.
翻译:[translated abstract in Chinese]
JEPA系列世界模型采用静态预测器,其权重在测试动态偏离训练分布时无法自适应调整。本研究比较了两种在分布偏移下将累积经验融入JEPA预测器的机制:操作数侧注入(Operand-side Injection),即将压缩经验表示作为残差添加到预测器隐藏状态(EI-JEPA);以及算子侧调制(Operator-side Modulation),即通过LoRA作用于预测器权重生成低秩权重增量(EPM-JEPA)。在预先注册的比较实验(移动MNIST、重力偏移)中,EPM-JEPA(偏移度量D_shift^{n=50}=0.7848±0.0078,三组随机种子)与EI-JEPA(0.8238)的差异为delta=4.74%——根据我们的判定标准,该结果属于类型C:零结果。作为二次非预注册观察,EPM-JEPA在无记忆基线模型(0.8000)上实现了1.90%的改进(各随机种子表现一致),而EI-JEPA表现低于基线,这表明性能提升源自权重量级调制方法。我们的主要贡献在于机制分析:D_shift^{n=50}轨迹反映三个独立动力学过程——缓冲循环、EMA目标漂移及内在LoRA稳定瞬态(+0.021),而非收敛至平衡态。这些发现推动了物理启发的后继模型PEM-JEPA的提出,旨在解决这种动力学峰值局限性。