Accurately anticipating how complex, diverse scenes will evolve requires models that represent uncertainty, simulate along extended interaction chains, and efficiently explore many plausible futures. Yet most existing approaches rely on dense video or latent-space prediction, expending substantial capacity on dense appearance rather than on the underlying sparse trajectories of points in the scene. This makes large-scale exploration of future hypotheses costly and limits performance when long-horizon, multi-modal motion is essential. We address this by formulating the prediction of open-set future scene dynamics as step-wise inference over sparse point trajectories. Our autoregressive diffusion model advances these trajectories through short, locally predictable transitions, explicitly modeling the growth of uncertainty over time. This dynamics-centric representation enables fast rollout of thousands of diverse futures from a single image, optionally guided by initial constraints on motion, while maintaining physical plausibility and long-range coherence. We further introduce OWM, a benchmark for open-set motion prediction based on diverse in-the-wild videos, to evaluate accuracy and variability of predicted trajectory distributions under real-world uncertainty. Our method matches or surpasses dense simulators in predictive accuracy while achieving orders-of-magnitude higher sampling speed, making open-set future prediction both scalable and practical. Project page: http://compvis.github.io/myriad.
翻译:准确预见复杂多样场景的演变需要模型能够表征不确定性、模拟长程交互链条,并高效探索大量可能的未来。然而,现有方法大多依赖密集视频或潜空间预测,将大量计算资源消耗在密集外观而非场景中点的稀疏轨迹上。这使得大规模探索未来假设的成本高昂,并在长时域、多模态运动至关重要的场景下限制了性能。为此,我们将开放集未来场景动态预测问题形式化为基于稀疏点轨迹的逐步推理。我们的自回归扩散模型通过短程、局部可预测的过渡来推进这些轨迹,显式地建模了随时间增长的不确定性。这种以动态为中心的表示能够从单张图像快速生成数千种多样化的未来路径,并可选择性地受初始运动约束引导,同时保持物理合理性和长程一致性。我们进一步引入了OWM,一个基于多样野外视频的开放集运动预测基准,用于评估在真实世界不确定性下预测轨迹分布的准确性与多样性。我们的方法在预测精度上与密集模拟器相当或更优,同时实现了数量级更高的采样速度,使开放集未来预测既具备可扩展性又具有实用性。项目页面:http://compvis.github.io/myriad。