Neural image classifiers are known to undergo severe performance degradation when exposed to input that exhibits covariate shift with respect to the training distribution. In this paper, we show that recent Text-to-Image (T2I) generators' ability to edit images to approximate interventions via natural-language prompts is a promising technology to train more robust classifiers. Using current open-source models, we find that a variety of prompting strategies are effective for producing augmented training datasets sufficient to achieve state-of-the-art performance (1) in widely adopted Single-Domain Generalization benchmarks, (2) in reducing classifiers' dependency on spurious features and (3) facilitating the application of Multi-Domain Generalization techniques when fewer training domains are available.
翻译:神经图像分类器在面对与训练分布存在协变量偏移的输入时,会经历严重的性能退化。本文表明,近期文本到图像(T2I)生成器通过自然语言提示编辑图像以近似干预的能力,是训练更稳健分类器的一项有前景的技术。利用当前开源模型,我们发现多种提示策略在生成增强训练数据集方面效果显著,足以实现:(1)在广泛采用的单域泛化基准中达到最先进性能;(2)减少分类器对虚假特征的依赖;(3)在可用训练域较少时促进多域泛化技术的应用。