Generative foundation models contain broad visual knowledge and can produce diverse image variations, making them particularly promising for advancing domain generalization tasks. They can be used for training data augmentation, but synthesizing comprehensive target-domain variations remains slow, expensive, and incomplete. We propose an alternative: using diffusion models at test time to map target images back to the source distribution where the downstream model was trained. This approach requires only a source domain description, preserves the task model, and eliminates large-scale synthetic data generation. We demonstrate consistent improvements across segmentation, detection, and classification tasks under challenging environmental shifts in real-to-real domain generalization scenarios with unknown target distributions. Our analysis spans multiple generative and downstream models, including an ensemble variant for enhanced robustness. The method improves BDD100K-Night-Det mAP@50 from 10.2 to 31.8, ImageNet-R top-1 from 36.1 to 60.8, and DarkZurich mIoU from 28.6 to 46.3.
翻译:生成式基础模型蕴含广泛的视觉知识,并能生成多样化的图像变体,这使其在推进域泛化任务方面展现出巨大潜力。它们可用于训练数据增强,但合成全面的目标域变体仍然缓慢、昂贵且不完整。我们提出一种替代方案:在测试时使用扩散模型将目标图像映射回下游模型训练所用的源分布。该方法仅需源域描述,保留任务模型,并消除了大规模合成数据生成的需求。我们在真实到真实的域泛化场景中,针对未知目标分布下的挑战性环境变化,在分割、检测和分类任务上展示了一致的性能提升。我们的分析涵盖多种生成模型与下游模型,包括为增强鲁棒性而设计的集成变体。该方法将BDD100K-Night-Det的mAP@50从10.2提升至31.8,ImageNet-R的top-1准确率从36.1提升至60.8,DarkZurich的mIoU从28.6提升至46.3。