Disentangled Contrastive Image Translation for Nighttime Surveillance

Nighttime surveillance suffers from degradation due to poor illumination and arduous human annotations. It is challengable and remains a security risk at night. Existing methods rely on multi-spectral images to perceive objects in the dark, which are troubled by low resolution and color absence. We argue that the ultimate solution for nighttime surveillance is night-to-day translation, or Night2Day, which aims to translate a surveillance scene from nighttime to the daytime while maintaining semantic consistency. To achieve this, this paper presents a Disentangled Contrastive (DiCo) learning method. Specifically, to address the poor and complex illumination in the nighttime scenes, we propose a learnable physical prior, i.e., the color invariant, which provides a stable perception of a highly dynamic night environment and can be incorporated into the learning pipeline of neural networks. Targeting the surveillance scenes, we develop a disentangled representation, which is an auxiliary pretext task that separates surveillance scenes into the foreground and background with contrastive learning. Such a strategy can extract the semantics without supervision and boost our model to achieve instance-aware translation. Finally, we incorporate all the modules above into generative adversarial networks and achieve high-fidelity translation. This paper also contributes a new surveillance dataset called NightSuR. It includes six scenes to support the study on nighttime surveillance. This dataset collects nighttime images with different properties of nighttime environments, such as flare and extreme darkness. Extensive experiments demonstrate that our method outperforms existing works significantly. The dataset and source code will be released on GitHub soon.

翻译：夜间监控因光照不足和人工标注困难而面临退化问题，这极具挑战性且构成夜间安全风险。现有方法依赖多光谱图像在暗处感知物体，但存在分辨率低和颜色缺失的缺陷。我们认为，夜间监控的最终解决方案是夜间到白天翻译（Night2Day），其目标是在保持语义一致性的前提下，将监控场景从夜间转换为白天。为实现这一目标，本文提出了一种解耦对比（DiCo）学习方法。具体而言，针对夜间场景中复杂且恶劣的光照条件，我们提出了一种可学习的物理先验——颜色不变量，它能够为高度动态的夜间环境提供稳定感知，并融入神经网络的训练流程。面向监控场景，我们开发了一种解耦表示，这是一种辅助预训练任务，通过对比学习将监控场景分离为前景和背景。该策略无需监督即可提取语义，并提升模型实现实例感知翻译的能力。最后，我们将上述所有模块整合到生成对抗网络中，实现了高保真翻译。本文还贡献了一个名为NightSuR的新监控数据集，包含六个场景以支持夜间监控研究。该数据集收集了具有夜间环境不同特性（如眩光和极暗）的夜间图像。大量实验表明，我们的方法显著优于现有工作。该数据集和源代码将很快在GitHub上发布。