A growing body of work studies Blindspot Discovery Methods ("BDM"s): methods that use an image embedding to find semantically meaningful (i.e., united by a human-understandable concept) subsets of the data where an image classifier performs significantly worse. Motivated by observed gaps in prior work, we introduce a new framework for evaluating BDMs, SpotCheck, that uses synthetic image datasets to train models with known blindspots and a new BDM, PlaneSpot, that uses a 2D image representation. We use SpotCheck to run controlled experiments that identify factors that influence BDM performance (e.g., the number of blindspots in a model, or features used to define the blindspot) and show that PlaneSpot is competitive with and in many cases outperforms existing BDMs. Importantly, we validate these findings by designing additional experiments that use real image data from MS-COCO, a large image benchmark dataset. Our findings suggest several promising directions for future work on BDM design and evaluation. Overall, we hope that the methodology and analyses presented in this work will help facilitate a more rigorous science of blindspot discovery.
翻译:盲点发现方法(BDM)研究日益增多:此类方法利用图像嵌入,从数据中寻找语义上有意义(即由人类可理解概念统一)的子集,在这些子集中图像分类器的性能显著下降。受先前工作所揭示的空白启发,我们提出了一个评估BDM的新框架SpotCheck,该框架利用合成图像数据集训练具有已知盲点的模型,并引入了一种新的BDM——PlaneSpot,它采用二维图像表示。我们使用SpotCheck进行受控实验,识别影响BDM性能的因素(例如模型中的盲点数量或用于定义盲点的特征),并证明PlaneSpot与现有BDM相比具有竞争力,且在多数情况下表现更优。重要的是,我们通过设计使用来自大规模图像基准数据集MS-COCO的真实图像数据的额外实验,验证了这些发现。我们的研究结果为BDM设计与评估的未来工作指明了若干有前景的方向。总体而言,我们希望本工作所提出的方法论与分析能够促进更严谨的盲点发现科学。