Diffusion models excel in noise-to-data generation tasks, providing a mapping from a Gaussian distribution to a more complex data distribution. However they struggle to model translations between complex distributions, limiting their effectiveness in data-to-data tasks. While Bridge Matching models address this by finding the translation between data distributions, their application to time-correlated data sequences remains unexplored. This is a critical limitation for video generation and manipulation tasks, where maintaining temporal coherence is particularly important. To address this gap, we propose Time-Correlated Video Bridge Matching (TCVBM), a framework that extends BM to time-correlated data sequences in the video domain. TCVBM explicitly models inter-sequence dependencies within the diffusion bridge, directly incorporating temporal correlations into the sampling process. We compare our approach to classical methods based on bridge matching and diffusion models for three video-related tasks: frame interpolation, image-to-video generation, and video super-resolution. TCVBM achieves superior performance across multiple quantitative metrics, demonstrating enhanced generation quality and reconstruction fidelity.
翻译:扩散模型在从噪声到数据的生成任务中表现出色,能够将高斯分布映射到更复杂的数据分布。然而,它们在建模复杂分布之间的转换时存在困难,这限制了其在数据到数据任务中的有效性。尽管桥接匹配模型通过寻找数据分布之间的转换解决了这一问题,但其在时间相关数据序列中的应用仍未得到探索。这对于视频生成和操作任务而言是一个关键限制,因为在这些任务中保持时间一致性尤为重要。为了填补这一空白,我们提出了时间相关视频桥接匹配(TCVBM),这是一种将桥接匹配扩展到视频领域时间相关数据序列的框架。TCVBM在扩散桥接中显式建模序列间依赖性,直接将时间相关性纳入采样过程。我们将该方法与基于桥接匹配和扩散模型的经典方法在三个视频相关任务上进行了比较:帧插值、图像到视频生成以及视频超分辨率。TCVBM在多个定量指标上均取得了更优性能,展示了更好的生成质量和重建保真度。