Low-light conditions have an adverse impact on machine cognition, limiting the performance of computer vision systems in real life. Since low-light data is limited and difficult to annotate, we focus on image processing to enhance low-light images and improve the performance of any downstream task model, instead of fine-tuning each of the models which can be prohibitively expensive. We propose to improve the existing zero-reference low-light enhancement by leveraging the CLIP model to capture image prior and for semantic guidance. Specifically, we propose a data augmentation strategy to learn an image prior via prompt learning, based on image sampling, to learn the image prior without any need for paired or unpaired normal-light data. Next, we propose a semantic guidance strategy that maximally takes advantage of existing low-light annotation by introducing both content and context cues about the image training patches. We experimentally show, in a qualitative study, that the proposed prior and semantic guidance help to improve the overall image contrast and hue, as well as improve background-foreground discrimination, resulting in reduced over-saturation and noise over-amplification, common in related zero-reference methods. As we target machine cognition, rather than rely on assuming the correlation between human perception and downstream task performance, we conduct and present an ablation study and comparison with related zero-reference methods in terms of task-based performance across many low-light datasets, including image classification, object and face detection, showing the effectiveness of our proposed method.
翻译:低光照条件对机器认知产生不利影响,限制了计算机视觉系统在现实生活中的性能表现。由于低光照数据有限且标注困难,我们专注于通过图像处理增强低光照图像,以提升任意下游任务模型的性能,而非对每个模型进行代价高昂的微调。我们提出通过利用CLIP模型捕获图像先验并提供语义指导,以改进现有的零参考低光照增强方法。具体而言,我们提出一种基于图像采样的提示学习数据增强策略,无需任何配对或非配对的正常光照数据即可学习图像先验。其次,我们提出一种语义指导策略,通过引入关于图像训练块的内容线索和上下文线索,最大限度地利用现有的低光照标注。定性实验表明,所提出的先验和语义指导有助于改善整体图像对比度与色调,提升背景-前景区分度,从而减少相关零参考方法中常见的过度饱和与噪声放大问题。由于我们的目标是机器认知,而非依赖人类感知与下游任务性能之间的假设关联,我们在多个低光照数据集上进行了消融研究,并与相关零参考方法在基于任务的性能表现(包括图像分类、目标检测和人脸检测)方面进行比较,证明了所提方法的有效性。