Generative foundation models contain broad visual knowledge and can produce diverse image variations, making them particularly promising for advancing domain generalization tasks. While they can be used for training data augmentation, synthesizing comprehensive target-domain variations remains slow, expensive, and incomplete. We propose an alternative: using diffusion models at test time to map target images back to the source distribution where the downstream model was trained. This approach requires only a source domain description, preserves the task model, and eliminates large-scale synthetic data generation. We demonstrate consistent improvements across segmentation, detection, and classification tasks under challenging environmental shifts in real-to-real domain generalization scenarios with unknown target distributions. Our analysis spans multiple generative and downstream models, including an ensemble variant for enhanced robustness. The method achieves substantial relative gains: 137% on BDD100K-Night, 68% on ImageNet-R, and 62% on DarkZurich.
翻译:生成式基础模型蕴含广泛的视觉知识,能够生成多样化的图像变体,这使其在推进域泛化任务方面展现出显著潜力。虽然此类模型可用于训练数据增强,但合成全面的目标域变体仍然存在速度慢、成本高且不完整的局限。我们提出一种替代方案:在测试阶段使用扩散模型将目标图像映射回下游模型训练所用的源分布。该方法仅需源域描述,保留任务模型结构,并避免了大规模合成数据生成。我们在真实到真实的域泛化场景中,针对未知目标分布下的环境剧烈变化,在分割、检测与分类任务上均验证了性能的持续提升。我们的分析涵盖多种生成模型与下游模型,包括为增强鲁棒性设计的集成变体。该方法实现了显著的相对性能增益:在BDD100K-Night上提升137%,在ImageNet-R上提升68%,在DarkZurich上提升62%。