In this paper, we introduce DiQP; a novel Transformer-Diffusion model for restoring 8K video quality degraded by codec compression. To the best of our knowledge, our model is the first to consider restoring the artifacts introduced by various codecs (AV1, HEVC) by Denoising Diffusion without considering additional noise. This approach allows us to model the complex, non-Gaussian nature of compression artifacts, effectively learning to reverse the degradation. Our architecture combines the power of Transformers to capture long-range dependencies with an enhanced windowed mechanism that preserves spatiotemporal context within groups of pixels across frames. To further enhance restoration, the model incorporates auxiliary "Look Ahead" and "Look Around" modules, providing both future and surrounding frame information to aid in reconstructing fine details and enhancing overall visual quality. Extensive experiments on different datasets demonstrate that our model outperforms state-of-the-art methods, particularly for high-resolution videos such as 4K and 8K, showcasing its effectiveness in restoring perceptually pleasing videos from highly compressed sources.
翻译:本文提出了一种新颖的Transformer-Diffusion模型DiQP,用于修复因编解码器压缩而质量受损的8K视频。据我们所知,我们的模型是首个在不考虑额外噪声的情况下,通过去噪扩散过程来修复多种编解码器(AV1、HEVC)所引入伪影的模型。该方法使我们能够对压缩伪影复杂且非高斯的特性进行建模,从而有效地学习逆转质量退化过程。我们的架构结合了Transformer捕获长程依赖的能力与一种增强的窗口化机制,该机制可在跨帧的像素组内保持时空上下文。为了进一步提升修复效果,模型还集成了辅助性的“前瞻”与“环视”模块,通过提供未来帧及周围帧信息,辅助重建精细细节并提升整体视觉质量。在不同数据集上进行的大量实验表明,我们的模型在性能上超越了现有最先进方法,尤其对于4K和8K等高分辨率视频,有效展示了其从高压缩源中恢复出感知质量优异视频的能力。