ControlUDA: Controllable Diffusion-assisted Unsupervised Domain Adaptation for Cross-Weather Semantic Segmentation

Data generation is recognized as a potent strategy for unsupervised domain adaptation (UDA) pertaining semantic segmentation in adverse weathers. Nevertheless, these adverse weather scenarios encompass multiple possibilities, and high-fidelity data synthesis with controllable weather is under-researched in previous UDA works. The recent strides in large-scale text-to-image diffusion models (DM) have ushered in a novel avenue for research, enabling the generation of realistic images conditioned on semantic labels. This capability proves instrumental for cross-domain data synthesis from source to target domain owing to their shared label space. Thus, source domain labels can be paired with those generated pseudo target data for training UDA. However, from the UDA perspective, there exists several challenges for DM training: (i) ground-truth labels from target domain are missing; (ii) the prompt generator may produce vague or noisy descriptions of images from adverse weathers; (iii) existing arts often struggle to well handle the complex scene structure and geometry of urban scenes when conditioned only on semantic labels. To tackle the above issues, we propose ControlUDA, a diffusion-assisted framework tailored for UDA segmentation under adverse weather conditions. It first leverages target prior from a pre-trained segmentor for tuning the DM, compensating the missing target domain labels; It also contains UDAControlNet, a condition-fused multi-scale and prompt-enhanced network targeted at high-fidelity data generation in adverse weathers. Training UDA with our generated data brings the model performances to a new milestone (72.0 mIoU) on the popular Cityscapes-to-ACDC benchmark for adverse weathers. Furthermore, ControlUDA helps to achieve good model generalizability on unseen data.

翻译：数据生成被认为是针对恶劣天气下语义分割的无监督域适应（UDA）的一种有效策略。然而，这些恶劣天气场景包含多种可能性，而先前UDA研究中关于可控天气的高保真数据合成尚不充分。大规模文本到图像扩散模型（DM）的最新进展开辟了新的研究方向，使得能够基于语义标签生成逼真的图像。由于源域与目标域共享相同的标签空间，这一能力对于跨域数据合成至关重要。因此，可将源域标签与生成的伪目标数据配对，用于训练UDA模型。然而，从UDA的角度来看，扩散模型的训练存在若干挑战：（i）缺少目标域的真实标签；（ii）提示生成器可能对恶劣天气的图像产生模糊或噪声描述；（iii）现有方法在仅依赖语义标签时，往往难以妥善处理城市场景的复杂结构和几何形态。为解决上述问题，我们提出ControlUDA，一种专为恶劣天气下UDA分割任务设计的扩散辅助框架。它首先利用预训练分割器中的目标域先验来微调扩散模型，弥补目标域标签缺失的问题；同时还包含UDAControlNet——一种融合条件的多尺度与增强提示网络，旨在恶劣天气下实现高保真数据生成。使用我们生成的数据训练UDA模型，可在面向恶劣天气的Cityscapes-to-ACDC基准上达到新的性能里程碑（72.0 mIoU）。此外，ControlUDA还有助于提升模型在未见数据上的泛化能力。