Privileged Foresight Distillation: Zero-Cost Future Correction for World Action Models

World action models jointly predict future video and action during training, raising an open question about what role the future-prediction branch actually plays. A recent finding shows that this branch can be removed at inference with little to no loss on common manipulation benchmarks, suggesting that future information may act merely as a regularizer on the shared visual backbone. We propose instead that joint training induces an action-conditioned correction that privileged future observations impose on action denoising, and that current-only policies capture this correction only partially. Making the account precise, we formulate privileged foresight as a residual in the action-denoising direction -- the difference between what a model predicts given the true future and what it predicts given only the current frame -- and introduce \emph{Privileged Foresight Distillation (PFD)}, which transfers this residual from a training-time teacher into a small adapter on a current-only student. The teacher and student share the same backbone and differ only in the attention mask over video tokens; future video is never generated at inference. Controlled experiments verify that this gain reflects a genuine future-conditioned correction rather than a side effect of capacity or regularization. Empirically, PFD achieves consistent improvements on LIBERO and RoboTwin manipulation benchmarks while preserving the current-only inference interface at negligible added latency. This view reframes the role of future information in world action models: not as a target to predict, nor as a regularizer to absorb, but as a compressible correction to be distilled.

翻译：世界行动模型在训练过程中联合预测未来视频和行动，这引发了一个开放性问题：未来预测分支究竟扮演什么角色？近期研究发现，该分支在推理时可被移除，且对常见操作基准的性能几乎无影响，这表明未来信息可能仅充当共享视觉骨干网络的正则化项。我们提出相反观点：联合训练引入了特权未来观测对行动去噪施加的行动条件化修正，而仅基于当前状态的策略只能部分捕获这种修正。为精确描述这一机制，我们将特权前瞻公式化为行动去噪方向上的残差——即模型在给定真实未来与仅给定当前帧时预测结果的差异——并引入*特权前瞻蒸馏（PFD）*，该技术将训练时教师模型中的该残差迁移至仅基于当前状态的学生模型中的小型适配器。教师与学生共享同一骨干网络，仅视频标记的注意力掩码不同；推理时无需生成未来视频。控制实验验证了该增益确实源于未来条件化的修正，而非模型容量或正则化的副作用。实验结果表明，PFD在LIBERO和RoboTwin操作基准上实现了一致性能提升，同时保持仅基于当前状态的推理接口，且附加延迟可忽略不计。这一观点重新定义了世界行动模型中未来信息的作用：既非待预测的目标，也非需吸收的正则化项，而是可供蒸馏的可压缩修正。