We present an approach to modeling an image-space prior on scene dynamics. Our prior is learned from a collection of motion trajectories extracted from real video sequences containing natural, oscillating motion such as trees, flowers, candles, and clothes blowing in the wind. Given a single image, our trained model uses a frequency-coordinated diffusion sampling process to predict a per-pixel long-term motion representation in the Fourier domain, which we call a neural stochastic motion texture. This representation can be converted into dense motion trajectories that span an entire video. Along with an image-based rendering module, these trajectories can be used for a number of downstream applications, such as turning still images into seamlessly looping dynamic videos, or allowing users to realistically interact with objects in real pictures.
翻译:我们提出一种对场景动力学进行图像空间先验建模的方法。该先验从包含自然振荡运动(如树木、花朵、蜡烛和被风吹动的衣物)的真实视频序列中提取的运动轨迹集合中学习得到。给定单一图像,我们的训练模型通过一种频率协调扩散采样过程,在傅里叶域预测逐像素的长期运动表示,我们称之为神经随机运动纹理。该表示可转化为覆盖整段视频的密集运动轨迹。结合基于图像的渲染模块,这些轨迹可用于多种下游应用,例如将静态图像转化为无缝循环的动态视频,或允许用户与真实图像中的物体进行逼真交互。