Neural image classifiers are known to undergo severe performance degradation when exposed to inputs that exhibit covariate shifts with respect to the training distribution. A general interventional data augmentation (IDA)mechanism that simulates arbitrary interventions over spurious variables has often been conjectured as a theoretical solution to this problem and approximated to varying degrees of success. In this work, we study how well modern Text-to-Image (T2I) generators and associated image editing techniques can solve the problem of IDA. We experiment across a diverse collection of benchmarks in domain generalization, ablating across key dimensions of T2I generation, including interventional prompts, conditioning mechanisms, and post-hoc filtering, showing that it substantially outperforms previously state-of-the-art image augmentation techniques independently of how each dimension is configured. We discuss the comparative advantages of using T2I for image editing versus synthesis, also finding that a simple retrieval baseline presents a surprisingly effective alternative, which raises interesting questions about how generative models should be evaluated in the context of domain generalization.
翻译:神经图像分类器在暴露于与训练分布存在协变量偏移的输入时,会经历严重的性能退化。一种通用的介入式数据增强机制,通过模拟对虚假变量的任意干预,常被视为该问题的理论解决方案,并以不同程度的成功得到近似实现。本研究探讨了现代文本到图像生成器及相关图像编辑技术在多大程度上能够解决介入式数据增强问题。我们在域泛化领域的一系列多样化基准上进行实验,对文本到图像生成的关键维度(包括介入提示、条件机制和事后过滤)进行消融分析,结果表明该方法显著优于先前最先进的图像增强技术,且与各维度的具体配置无关。我们讨论了使用文本到图像进行图像编辑与合成的比较优势,同时发现一个简单的检索基线作为替代方案出人意料地有效,这引发了关于在域泛化背景下如何评估生成模型的有趣问题。