In this paper, we propose a novel framework for solving high-definition video inverse problems using latent image diffusion models. Building on recent advancements in spatio-temporal optimization for video inverse problems using image diffusion models, our approach leverages latent-space diffusion models to achieve enhanced video quality and resolution. To address the high computational demands of processing high-resolution frames, we introduce a pseudo-batch consistent sampling strategy, allowing efficient operation on a single GPU. Additionally, to improve temporal consistency, we present pseudo-batch inversion, an initialization technique that incorporates informative latents from the measurement. By integrating with SDXL, our framework achieves state-of-the-art video reconstruction across a wide range of spatio-temporal inverse problems, including complex combinations of frame averaging and various spatial degradations, such as deblurring, super-resolution, and inpainting. Unlike previous methods, our approach supports multiple aspect ratios (landscape, vertical, and square) and delivers HD-resolution reconstructions (exceeding 1280x720) in under 6 seconds per frame on a single NVIDIA 4090 GPU.
翻译:本文提出了一种利用潜在图像扩散模型求解高清视频逆问题的新框架。该方法基于近期利用图像扩散模型进行视频逆问题时空优化的进展,通过采用潜在空间扩散模型以提升视频质量与分辨率。为应对高分辨率帧处理的高计算需求,我们引入了一种伪批次一致性采样策略,实现在单GPU上的高效运算。此外,为改善时间一致性,我们提出了伪批次反演初始化技术,该技术能够从测量数据中提取信息丰富的潜在表示。通过与SDXL集成,本框架在多种时空逆问题上实现了最先进的视频重建效果,包括帧平均与多种空间退化(如去模糊、超分辨率与修复)的复杂组合。与先前方法不同,本方案支持多种宽高比(横向、纵向与方形),并在单张NVIDIA 4090 GPU上以每帧低于6秒的速度实现高清分辨率(超过1280x720)重建。