Low-light conditions not only hamper human visual experience but also degrade the model's performance on downstream vision tasks. While existing works make remarkable progress on day-night domain adaptation, they rely heavily on domain knowledge derived from the task-specific nighttime dataset. This paper challenges a more complicated scenario with border applicability, i.e., zero-shot day-night domain adaptation, which eliminates reliance on any nighttime data. Unlike prior zero-shot adaptation approaches emphasizing either image-level translation or model-level adaptation, we propose a similarity min-max paradigm that considers them under a unified framework. On the image level, we darken images towards minimum feature similarity to enlarge the domain gap. Then on the model level, we maximize the feature similarity between the darkened images and their normal-light counterparts for better model adaptation. To the best of our knowledge, this work represents the pioneering effort in jointly optimizing both aspects, resulting in a significant improvement of model generalizability. Extensive experiments demonstrate our method's effectiveness and broad applicability on various nighttime vision tasks, including classification, semantic segmentation, visual place recognition, and video action recognition. Code and pre-trained models are available at https://red-fairy.github.io/ZeroShotDayNightDA-Webpage/.
翻译:低光照条件不仅影响人类的视觉体验,还会降低模型在下游视觉任务中的性能。现有工作在昼夜域适应方面取得了显著进展,但其高度依赖于从特定任务夜间数据集中获取的领域知识。本文挑战了一个更具广泛适用性的复杂场景,即零样本昼夜域适应,该场景消除了对任何夜间数据的依赖。与以往强调图像级翻译或模型级适应的零样本适应方法不同,我们提出了一种相似度最小-最大范式,将两者统一在同一个框架下考虑。在图像层面,我们通过降低图像亮度以最小化特征相似度,从而扩大域间差异;随后在模型层面,我们最大化暗化图像与其正常光照对应图像之间的特征相似度,以实现更好的模型适应。据我们所知,本研究是首次将这两个方面联合优化的尝试,显著提升了模型的泛化能力。大量实验证明,我们的方法在多种夜间视觉任务中具有有效性和广泛适用性,包括分类、语义分割、视觉地点识别以及视频动作识别。代码与预训练模型可在 https://red-fairy.github.io/ZeroShotDayNightDA-Webpage/ 获取。