Adapting a segmentation model from a labeled source domain to a target domain, where a single unlabeled datum is available, is one the most challenging problems in domain adaptation and is otherwise known as one-shot unsupervised domain adaptation (OSUDA). Most of the prior works have addressed the problem by relying on style transfer techniques, where the source images are stylized to have the appearance of the target domain. Departing from the common notion of transferring only the target ``texture'' information, we leverage text-to-image diffusion models (e.g., Stable Diffusion) to generate a synthetic target dataset with photo-realistic images that not only faithfully depict the style of the target domain, but are also characterized by novel scenes in diverse contexts. The text interface in our method Data AugmenTation with diffUsion Models (DATUM) endows us with the possibility of guiding the generation of images towards desired semantic concepts while respecting the original spatial context of a single training image, which is not possible in existing OSUDA methods. Extensive experiments on standard benchmarks show that our DATUM surpasses the state-of-the-art OSUDA methods by up to +7.1%. The implementation is available at https://github.com/yasserben/DATUM
翻译:在仅有一个无标注目标域样本的情况下,将分割模型从标注源域适配到目标域是域适应领域最具挑战性的问题之一,即单次无监督域适应。多数先前研究依赖风格迁移技术,通过将源域图像风格化为目标域外观来解决该问题。与仅迁移目标域"纹理"信息的传统思路不同,我们利用文本到图像扩散模型(如Stable Diffusion)生成合成目标数据集,其中包含不仅忠实呈现目标域风格,且具有多样化场景、新颖背景的照片级真实图像。我们所提出的扩散模型数据增强方法中的文本接口,使其能在保留单张训练图像原始空间上下文的同时,引导生成具有所需语义概念的图像,这是现有单次无监督域适应方法无法实现的。在标准基准上的大量实验表明,我们的数据增强方法超越了当前最优的单次无监督域适应方法,性能提升高达7.1%。相关实现代码已开源至https://github.com/yasserben/DATUM。