Although diffusion-based zero-shot image restoration and enhancement methods have achieved great success, applying them to video restoration or enhancement will lead to severe temporal flickering. In this paper, we propose the first framework that utilizes the rapidly-developed video diffusion model to assist the image-based method in maintaining more temporal consistency for zero-shot video restoration and enhancement. We propose homologous latents fusion, heterogenous latents fusion, and a COT-based fusion ratio strategy to utilize both homologous and heterogenous text-to-video diffusion models to complement the image method. Moreover, we propose temporal-strengthening post-processing to utilize the image-to-video diffusion model to further improve temporal consistency. Our method is training-free and can be applied to any diffusion-based image restoration and enhancement methods. Experimental results demonstrate the superiority of the proposed method.
翻译:尽管基于扩散模型的零样本图像修复与增强方法已取得显著成功,但将其直接应用于视频修复或增强任务会导致严重的时序闪烁问题。本文提出首个利用快速发展的视频扩散模型辅助基于图像的方法以提升零样本视频修复与增强任务时序一致性的框架。我们提出同源潜在融合、异源潜在融合以及基于COT的融合比例策略,以同时利用同源与异源文本到视频扩散模型来增强图像方法的性能。此外,我们提出时序强化后处理技术,通过图像到视频扩散模型进一步提升时序一致性。本方法无需训练,可适用于任何基于扩散模型的图像修复与增强方法。实验结果验证了所提方法的优越性。