With a growing interest in understanding neural network prediction strategies, Concept Activation Vectors (CAVs) have emerged as a popular tool for modeling human-understandable concepts in the latent space. Commonly, CAVs are computed by leveraging linear classifiers optimizing the separability of latent representations of samples with and without a given concept. However, in this paper we show that such a separability-oriented computation leads to solutions, which may diverge from the actual goal of precisely modeling the concept direction. This discrepancy can be attributed to the significant influence of distractor directions, i.e., signals unrelated to the concept, which are picked up by filters (i.e., weights) of linear models to optimize class-separability. To address this, we introduce pattern-based CAVs, solely focussing on concept signals, thereby providing more accurate concept directions. We evaluate various CAV methods in terms of their alignment with the true concept direction and their impact on CAV applications, including concept sensitivity testing and model correction for shortcut behavior caused by data artifacts. We demonstrate the benefits of pattern-based CAVs using the Pediatric Bone Age, ISIC2019, and FunnyBirds datasets with VGG, ResNet, and EfficientNet model architectures.
翻译:随着对理解神经网络预测策略的兴趣日益增长,概念激活向量(CAVs)已成为在潜在空间中建模人类可理解概念的流行工具。通常,CAVs通过利用线性分类器优化有无特定概念的样本潜在表示的可分性来计算。然而,本文表明这种面向可分性的计算会导致解决方案偏离精确建模概念方向的真实目标。这一差异可归因于干扰方向(即与概念无关的信号)的显著影响——线性模型的滤波器(即权重)为优化类别可分性而捕捉这些信号。为解决此问题,我们引入基于模式的CAVs,仅关注概念信号,从而提供更准确的概念方向。我们评估了多种CAV方法在真实概念方向对齐度及其对CAV应用(包括概念敏感性测试和由数据伪影引起的捷径行为模型校正)的影响。通过使用VGG、ResNet和EfficientNet模型架构在儿科骨龄、ISIC2019和FunnyBirds数据集上验证了基于模式CAVs的优势。