Image enhancement is a significant research area in the fields of computer vision and image processing. In recent years, many learning-based methods for image enhancement have been developed, where the Look-up-table (LUT) has proven to be an effective tool. In this paper, we delve into the potential of Contrastive Language-Image Pre-Training (CLIP) Guided Prompt Learning, proposing a simple structure called CLIP-LUT for image enhancement. We found that the prior knowledge of CLIP can effectively discern the quality of degraded images, which can provide reliable guidance. To be specific, We initially learn image-perceptive prompts to distinguish between original and target images using CLIP model, in the meanwhile, we introduce a very simple network by incorporating a simple baseline to predict the weights of three different LUT as enhancement network. The obtained prompts are used to steer the enhancement network like a loss function and improve the performance of model. We demonstrate that by simply combining a straightforward method with CLIP, we can obtain satisfactory results.
翻译:图像增强是计算机视觉与图像处理领域中的重要研究方向。近年来,许多基于学习的图像增强方法被提出,其中查找表(Look-up-table, LUT)已被证明是一种有效的工具。本文深入探索了对比语言-图像预训练(Contrastive Language-Image Pre-Training, CLIP)引导的提示学习潜力,提出了一种名为CLIP-LUT的简单结构用于图像增强。我们发现CLIP的先验知识能有效判别退化图像的质量,从而提供可靠的指导。具体而言,我们首先利用CLIP模型学习图像感知提示(image-perceptive prompts)以区分原始图像与目标图像;同时,我们引入了一个非常简单的网络,通过整合基础基线方法预测三种不同LUT的权重作为增强网络。所获得的提示被用作类似损失函数的功能来引导增强网络,从而提升模型性能。我们证明,仅通过将简单方法与CLIP相结合,即可获得令人满意的结果。