Improper exposure often leads to severe loss of details, color distortion, and reduced contrast. Exposure correction still faces two critical challenges: (1) the ignorance of object-wise regional semantic information causes the color shift artifacts; (2) real-world exposure images generally have no ground-truth labels, and its labeling entails massive manual editing. To tackle the challenges, we propose a new unsupervised semantic-aware exposure correction network. It contains an adaptive semantic-aware fusion module, which effectively fuses the semantic information extracted from a pre-trained Fast Segment Anything Model into a shared image feature space. Then the fused features are used by our multi-scale residual spatial mamba group to restore the details and adjust the exposure. To avoid manual editing, we propose a pseudo-ground truth generator guided by CLIP, which is fine-tuned to automatically identify exposure situations and instruct the tailored corrections. Also, we leverage the rich priors from the FastSAM and CLIP to develop a semantic-prompt consistency loss to enforce semantic consistency and image-prompt alignment for unsupervised training. Comprehensive experimental results illustrate the effectiveness of our method in correcting real-world exposure images and outperforms state-of-the-art unsupervised methods both numerically and visually.
翻译:不当的曝光常导致细节严重丢失、色彩失真和对比度降低。曝光校正仍面临两大关键挑战:(1) 对物体级区域语义信息的忽视会导致色彩偏移伪影;(2) 真实世界的曝光图像通常缺乏真实标签,其标注需要大量人工编辑。为应对这些挑战,我们提出了一种新的无监督语义感知曝光校正网络。该网络包含一个自适应语义感知融合模块,能有效地将从预训练Fast Segment Anything Model中提取的语义信息融合到共享的图像特征空间中。随后,融合后的特征由我们的多尺度残差空间曼巴组用于恢复细节并调整曝光。为避免人工编辑,我们提出了一种由CLIP引导的伪真实标签生成器,该生成器经过微调,能自动识别曝光情况并指导定制化的校正。此外,我们利用FastSAM和CLIP的丰富先验知识,开发了一种语义提示一致性损失,以在无监督训练中强制语义一致性和图像-提示对齐。综合实验结果表明,我们的方法在校正真实世界曝光图像方面具有有效性,并在数值和视觉上均优于最先进的无监督方法。