SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency

This work presents an effective depth-consistency self-prompt Transformer for image dehazing. It is motivated by an observation that the estimated depths of an image with haze residuals and its clear counterpart vary. Enforcing the depth consistency of dehazed images with clear ones, therefore, is essential for dehazing. For this purpose, we develop a prompt based on the features of depth differences between the hazy input images and corresponding clear counterparts that can guide dehazing models for better restoration. Specifically, we first apply deep features extracted from the input images to the depth difference features for generating the prompt that contains the haze residual information in the input. Then we propose a prompt embedding module that is designed to perceive the haze residuals, by linearly adding the prompt to the deep features. Further, we develop an effective prompt attention module to pay more attention to haze residuals for better removal. By incorporating the prompt, prompt embedding, and prompt attention into an encoder-decoder network based on VQGAN, we can achieve better perception quality. As the depths of clear images are not available at inference, and the dehazed images with one-time feed-forward execution may still contain a portion of haze residuals, we propose a new continuous self-prompt inference that can iteratively correct the dehazing model towards better haze-free image generation. Extensive experiments show that our method performs favorably against the state-of-the-art approaches on both synthetic and real-world datasets in terms of perception metrics including NIQE, PI, and PIQE.

翻译：本文提出了一种基于深度一致性的高效自提示Transformer用于图像去雾。该方法的动机源于一个观察：含有残余雾气的图像与其清晰版本的估计深度存在差异。因此，强制去雾图像与清晰图像之间的深度一致性对于去雾至关重要。为此，我们开发了一种基于雾霾输入图像与对应清晰图像间深度差特征的提示，该提示可引导去雾模型实现更好的恢复效果。具体而言，我们首先将输入图像提取的深度特征应用于深度差特征，以生成包含输入中雾气残余信息的提示。随后提出提示嵌入模块，通过将提示线性叠加到深度特征上，实现对雾气残余的感知。进一步，我们设计了一种高效的提示注意力模块，以更关注雾气残余区域从而实现更好的去除效果。通过将提示、提示嵌入和提示注意力整合到基于VQGAN的编码器-解码器网络中，我们获得了更优的感知质量。由于推理时无法获得清晰图像深度，且单次前向执行生成的去雾图像仍可能含有部分雾气残余，我们提出了连续自提示推理机制，可迭代校正去雾模型以生成更优的无雾图像。大量实验表明，在NIQE、PI和PIQE等感知指标上，我们的方法在合成与真实数据集上均优于现有最先进方法。