Source-Free Domain Generalization (SFDG) aims to develop a model that works for unseen target domains without relying on any source domain. Recent work, PromptStyler, employs text prompts to simulate different distribution shifts in the joint vision-language space, allowing the model to generalize effectively to unseen domains without using any images. However, 1) PromptStyler's style generation strategy has limitations, as all style patterns are fixed after the first training phase. This leads to the training set in the second training phase being restricted to a limited set of styles. Additionally, 2) the frozen text encoder in PromptStyler result in the encoder's output varying with the style of the input text prompts, making it difficult for the model to learn domain-invariant features. In this paper, we introduce Dynamic PromptStyler (DPStyler), comprising Style Generation and Style Removal modules to address these issues. The Style Generation module refreshes all styles at every training epoch, while the Style Removal module eliminates variations in the encoder's output features caused by input styles. Moreover, since the Style Generation module, responsible for generating style word vectors using random sampling or style mixing, makes the model sensitive to input text prompts, we introduce a model ensemble method to mitigate this sensitivity. Extensive experiments demonstrate that our framework outperforms state-of-the-art methods on benchmark datasets.
翻译:无源域泛化(SFDG)旨在开发一种无需依赖任何源域即可适用于未知目标域的模型。近期工作PromptStyler通过文本提示模拟视觉-语言联合空间中的不同分布偏移,使模型无需使用任何图像即可有效泛化至未知域。然而,1) PromptStyler的样式生成策略存在局限性,所有样式模式在第一训练阶段后固定不变,导致第二训练阶段的训练集仅局限于有限样式集合;2) PromptStyler中冻结的文本编码器使其输出随输入文本提示的样式变化,阻碍模型学习域不变特征。本文提出动态提示样式器(DPStyler),包含样式生成模块与样式移除模块以解决上述问题。样式生成模块在每个训练周期刷新所有样式,而样式移除模块消除输入样式引起的编码器输出特征变异。此外,由于负责通过随机采样或样式混合生成样式词向量的样式生成模块会使模型对输入文本提示敏感,我们引入模型集成方法以缓解该敏感性。大量实验表明,本框架在基准数据集上优于现有最先进方法。