Adapting a segmentation model from a labeled source domain to a target domain, where a single unlabeled datum is available, is one the most challenging problems in domain adaptation and is otherwise known as one-shot unsupervised domain adaptation (OSUDA). Most of the prior works have addressed the problem by relying on style transfer techniques, where the source images are stylized to have the appearance of the target domain. Departing from the common notion of transferring only the target ``texture'' information, we leverage text-to-image diffusion models (e.g., Stable Diffusion) to generate a synthetic target dataset with photo-realistic images that not only faithfully depict the style of the target domain, but are also characterized by novel scenes in diverse contexts. The text interface in our method Data AugmenTation with diffUsion Models (DATUM) endows us with the possibility of guiding the generation of images towards desired semantic concepts while respecting the original spatial context of a single training image, which is not possible in existing OSUDA methods. Extensive experiments on standard benchmarks show that our DATUM surpasses the state-of-the-art OSUDA methods by up to +7.1%. The implementation is available at https://github.com/yasserben/DATUM
翻译:将分割模型从带标签的源域适应到仅有一个无标签数据的目标域,是域适应中最具挑战性的问题之一,称为单样本无监督域适应(OSUDA)。现有方法大多依赖风格迁移技术,通过将源图像风格化为目标域外观。与仅迁移目标“纹理”信息的常见思路不同,我们利用文本到图像扩散模型(如Stable Diffusion)生成合成目标数据集——该数据集不仅包含忠实呈现目标域风格的照片级真实图像,还以多样化场景与新颖场景为特征。本文方法DATUM(扩散模型数据增强)通过文本接口,能在保留单张训练图像原始空间上下文的同时,引导图像生成朝向特定语义概念,这是现有OSUDA方法无法实现的。在标准基准上的大量实验表明,我们的DATUM方法较最先进的OSUDA方法性能提升高达+7.1%。实现代码见https://github.com/yasserben/DATUM