Fewer, Better Frames: A Compute-Normalized Proof of Concept for Coherence-First World-Model Rendering with Model-Guided FSR4 Frame Generation

World models are often evaluated by native frame cadence, but higher nominal frame rate can trade away long-horizon scene stability. This article reports an independent proof of concept implemented using Overworld's Waypoint-1.5 family and WorldEngine runtime on a Windows fallback stack with ONNX Runtime + DirectML and an FSR4 DX12 bridge. The tested coherence-first branch generates higher-context anchor frames at a 15 FPS presentation-timeline cadence and reconstructs presentation to 30 FPS using latent-delta motion guidance and synthesized depth. It is compared against a lower-context cadence-first baseline that generates about 30 FPS natively under the same seed, route, control script, target presentation duration, and local time-scaling regime. Across forest, sword, desert, and snow scenes, the coherence-first branch preserves path geometry, object identity, large silhouettes, and depth layering longer, while the baseline degrades earlier into brightness drift and geometric distortion. Lightweight temporal metrics and paired videos support the visual comparison, with LPIPS favoring the coherence-first branch across all tested scenes. Here compute-normalized means approximately matched same-GPU, same-timescale operating points, not exact FLOP parity or measured realtime throughput. A separate heavier sword-scene probe suggests local non-monotonicity: more context and denoising did not automatically improve quality. These results support coherence-first allocation as a practical proof-of-concept strategy under limited inference budget, not as a finished realtime renderer.

翻译：世界模型通常以原生帧率评估，但更高的标称帧率可能牺牲长程场景稳定性。本文报告了一项独立的概念验证，该验证基于Overworld的Waypoint-1.5系列模型和WorldEngine运行时，在Windows回退栈上使用ONNX Runtime + DirectML以及FSR4 DX12桥接实现。所测试的以连贯性为先的分支在15 FPS的呈现时间线上生成更高上下文的锚点帧，并通过隐变量-增量运动引导和合成深度重建至30 FPS的呈现。该分支与一个低上下文的以帧率优先的基线进行比较，后者在相同种子、路线、控制脚本、目标呈现时长和局部时间缩放机制下原生生成约30 FPS。在森林、剑、沙漠和雪景场景中，以连贯性为先的分支能更长久地保持路径几何结构、物体身份、大型轮廓和深度层次，而基线则更早出现亮度漂移和几何畸变。轻量级时间度量指标和配对视频支持视觉比较，LPIPS在所有测试场景中均偏向于以连贯性为先的分支。此处“计算标准化”指近似匹配同GPU、同时标工作点，而非精确的FLOP对等或实测实时吞吐量。另设的较重剑场景探针表明局部非单调性：更多上下文和去噪并未自动提升质量。这些结果支持以连贯性为先的分配策略作为有限推理预算下的一种实用概念验证方案，而非完整的实时渲染器。