This study investigates the robustness of image classifiers to text-guided corruptions. We utilize diffusion models to edit images to different domains. Unlike other works that use synthetic or hand-picked data for benchmarking, we use diffusion models as they are generative models capable of learning to edit images while preserving their semantic content. Thus, the corruptions will be more realistic and the comparison will be more informative. Also, there is no need for manual labeling and we can create large-scale benchmarks with less effort. We define a prompt hierarchy based on the original ImageNet hierarchy to apply edits in different domains. As well as introducing a new benchmark we try to investigate the robustness of different vision models. The results of this study demonstrate that the performance of image classifiers decreases significantly in different language-based corruptions and edit domains. We also observe that convolutional models are more robust than transformer architectures. Additionally, we see that common data augmentation techniques can improve the performance on both the original data and the edited images. The findings of this research can help improve the design of image classifiers and contribute to the development of more robust machine learning systems. The code for generating the benchmark is available at https://github.com/ckoorosh/RobuText.
翻译:本研究探讨了图像分类器在文本引导扰动下的鲁棒性。我们利用扩散模型将图像编辑至不同领域。不同于其他使用合成或人工筛选数据构建基准的工作,本文采用扩散模型——这类生成模型能够在保留语义内容的前提下学习编辑图像。由此产生的扰动更具真实性,使得比较更具参考价值。同时,该方法无需人工标注,能以更低成本构建大规模基准。基于原始ImageNet层级结构,我们定义了用于不同领域编辑的提示层级体系。在提出新基准的同时,我们尝试探究不同视觉模型的鲁棒性。研究结果表明,在不同语言驱动的扰动和编辑领域下,图像分类器的性能显著下降。我们还观察到卷积模型比Transformer架构更具鲁棒性。此外,常见的数据增强技术既能提升原始数据上的性能,也能改善编辑后图像的表现。本研究结果有助于改进图像分类器设计,推动更鲁棒机器学习系统的开发。生成该基准的代码可从 https://github.com/ckoorosh/RobuText 获取。