In recent years, many convolutional neural network-based models are designed for JPEG artifacts reduction, and have achieved notable progress. However, few methods are suitable for extreme low-bitrate image compression artifacts reduction. The main challenge is that the highly compressed image loses too much information, resulting in reconstructing high-quality image difficultly. To address this issue, we propose a multimodal fusion learning method for text-guided JPEG artifacts reduction, in which the corresponding text description not only provides the potential prior information of the highly compressed image, but also serves as supplementary information to assist in image deblocking. We fuse image features and text semantic features from the global and local perspectives respectively, and design a contrastive loss built upon contrastive learning to produce visually pleasing results. Extensive experiments, including a user study, prove that our method can obtain better deblocking results compared to the state-of-the-art methods.
翻译:近年来,基于卷积神经网络的模型被广泛应用于JPEG伪影抑制任务,并取得了显著进展。然而,现有方法在应对极端低比特率图像压缩伪影抑制方面仍存在局限。其核心挑战在于高度压缩的图像丢失了大量信息,导致高质量图像重建困难。针对该问题,我们提出一种面向文本引导JPEG伪影抑制的多模态融合学习方法:文本描述不仅提供高度压缩图像的潜在先验信息,更可作为补充信息辅助图像去块。通过分别从全局与局部视角融合图像特征与文本语义特征,并结合对比学习构建对比损失函数,最终生成视觉质量显著提升的结果。大量实验(含用户研究)证明,与现有最优方法相比,本方法可获得更优的去块效果。