In this paper, we propose a novel framework for solving high-definition video inverse problems using latent image diffusion models. Building on recent advancements in spatio-temporal optimization for video inverse problems using image diffusion models, our approach leverages latent-space diffusion models to achieve enhanced video quality and resolution. To address the high computational demands of processing high-resolution frames, we introduce a pseudo-batch consistent sampling strategy, allowing efficient operation on a single GPU. Additionally, to improve temporal consistency, we present batch-consistent inversion, an initialization technique that incorporates informative latents from the measurement frame. By integrating with SDXL, our framework achieves state-of-the-art video reconstruction across a wide range of spatio-temporal inverse problems, including complex combinations of frame averaging and various spatial degradations, such as deblurring, super-resolution, and inpainting. Unlike previous methods, our approach supports multiple aspect ratios (landscape, vertical, and square) and delivers HD-resolution reconstructions (exceeding 1280x720) in under 2.5 minutes on a single NVIDIA 4090 GPU.
翻译:本文提出了一种利用潜在图像扩散模型解决高清视频逆问题的新框架。该方法基于近期利用图像扩散模型进行视频逆问题时空优化的进展,通过利用潜在空间扩散模型来提升视频质量与分辨率。为应对处理高分辨率帧的高计算需求,我们引入了一种伪批次一致性采样策略,使得在单GPU上能够高效运行。此外,为提高时间一致性,我们提出了批次一致性反演——一种融合测量帧信息性潜在向量的初始化技术。通过与SDXL集成,我们的框架在广泛的时空逆问题上实现了最先进的视频重建,包括帧平均与多种空间退化(如去模糊、超分辨率和修复)的复杂组合。与先前方法不同,本方法支持多种宽高比(横向、纵向及方形),并能在单块NVIDIA 4090 GPU上于2.5分钟内完成高清分辨率(超过1280x720)的重建。