Existing methods have achieved remarkable performance in image dehazing, particularly on synthetic datasets. However, they often struggle with real-world hazy images due to domain shift, limiting their practical applicability. This paper introduces HazeCLIP, a language-guided adaptation framework designed to enhance the real-world performance of pre-trained dehazing networks. Inspired by the Contrastive Language-Image Pre-training (CLIP) model's ability to distinguish between hazy and clean images, we leverage it to evaluate dehazing results. Combined with a region-specific dehazing technique and tailored prompt sets, the CLIP model accurately identifies hazy areas, providing a high-quality, human-like prior that guides the fine-tuning process of pre-trained networks. Extensive experiments demonstrate that HazeCLIP achieves state-of-the-art performance in real-word image dehazing, evaluated through both visual quality and image quality assessment metrics. Codes are available at https://github.com/Troivyn/HazeCLIP.
翻译:现有方法在图像去雾领域取得了显著性能,尤其在合成数据集上表现优异。然而,由于领域偏移问题,这些方法在处理真实世界雾霾图像时往往效果不佳,限制了其实际应用。本文提出HazeCLIP,一种语言引导的自适应框架,旨在提升预训练去雾网络在真实场景中的性能。受对比语言-图像预训练(CLIP)模型区分有雾与清晰图像能力的启发,我们利用该模型评估去雾结果。结合区域特异性去雾技术与定制化的提示集,CLIP模型能够精准识别雾霾区域,提供高质量、类人感知的先验信息,从而指导预训练网络的微调过程。大量实验表明,HazeCLIP在真实世界图像去雾任务中取得了最先进的性能,该结论通过视觉质量和图像质量评估指标得到验证。代码公开于https://github.com/Troivyn/HazeCLIP。