Despite excellent average-case performance of many image classifiers, their performance can substantially deteriorate on semantically coherent subgroups of the data that were under-represented in the training data. These systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift. A major challenge is to identify such subgroups with subpar performance when the subgroups are not annotated and their occurrence is very rare. We leverage recent advances in text-to-image models and search in the space of textual descriptions of subgroups ("prompts") for subgroups where the target model has low performance on the prompt-conditioned synthesized data. To tackle the exponentially growing number of subgroups, we employ combinatorial testing. We denote this procedure as PromptAttack as it can be interpreted as an adversarial attack in a prompt space. We study subgroup coverage and identifiability with PromptAttack in a controlled setting and find that it identifies systematic errors with high accuracy. Thereupon, we apply PromptAttack to ImageNet classifiers and identify novel systematic errors on rare subgroups.
翻译:尽管许多图像分类器在平均性能上表现优异,但其性能可能在训练数据中代表性不足的语义连贯数据子组上显著下降。这些系统误差不仅会影响人口学少数群体的公平性,还会损害域迁移下的鲁棒性与安全性。当前主要挑战在于:当子组未被标注且其出现频率极低时,如何识别这些性能欠佳的子组。我们利用文本到图像模型的最新进展,在子组文本描述("提示词")空间中搜索目标模型在提示词条件合成数据上表现欠佳的子组。为应对呈指数增长子组数量,我们采用组合测试方法。该过程被命名为PromptAttack,因其可被解读为提示词空间中的对抗攻击。我们在受控环境下研究了PromptAttack的子组覆盖范围与可识别性,发现其能以高准确率识别系统误差。进而将PromptAttack应用于ImageNet分类器,识别出罕见子组上存在的全新系统误差。