Despite excellent average-case performance of many image classifiers, their performance can substantially deteriorate on semantically coherent subgroups of the data that were under-represented in the training data. These systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift. A major challenge is to identify such subgroups with subpar performance when the subgroups are not annotated and their occurrence is very rare. We leverage recent advances in text-to-image models and search in the space of textual descriptions of subgroups ("prompts") for subgroups where the target model has low performance on the prompt-conditioned synthesized data. To tackle the exponentially growing number of subgroups, we employ combinatorial testing. We denote this procedure as PromptAttack as it can be interpreted as an adversarial attack in a prompt space. We study subgroup coverage and identifiability with PromptAttack in a controlled setting and find that it identifies systematic errors with high accuracy. Thereupon, we apply PromptAttack to ImageNet classifiers and identify novel systematic errors on rare subgroups.
翻译:尽管许多图像分类器在平均性能上表现优异,但其在训练数据中代表性不足的语义连贯数据子群上的性能可能会显著下降。这些系统性错误既会影响人口统计少数群体的公平性,也会影响域偏移下的鲁棒性和安全性。当子群未被标注且其出现频率极低时,识别此类性能欠佳的子群是一项重大挑战。我们利用文本到图像模型的最新进展,在子群的文本描述("提示词")空间中搜索目标模型在提示词条件合成数据上表现低下的子群。为解决呈指数增长的子群数量问题,我们采用组合测试方法。我们将此过程命名为"提示攻击"(PromptAttack),因其可被解释为提示词空间中的对抗攻击。在受控环境下,我们研究了PromptAttack的子群覆盖度和可识别性,发现它能以高精度识别系统性错误。随后,我们将PromptAttack应用于ImageNet分类器,发现了罕见子群中的新型系统性错误。