Local Concept Embeddings for Analysis of Concept Distributions in DNN Feature Spaces

Insights into the learned latent representations are imperative for verifying deep neural networks (DNNs) in critical computer vision (CV) tasks. Therefore, state-of-the-art supervised Concept-based eXplainable Artificial Intelligence (C-XAI) methods associate user-defined concepts like ``car'' each with a single vector in the DNN latent space (concept embedding vector). In the case of concept segmentation, these linearly separate between activation map pixels belonging to a concept and those belonging to background. Existing methods for concept segmentation, however, fall short of capturing sub-concepts (e.g., ``proximate car'' and ``distant car''), and concept overlap (e.g., between ``bus'' and ``truck''). In other words, they do not capture the full distribution of concept representatives in latent space. For the first time, this work shows that these simplifications are frequently broken and that distribution information can be particularly useful for understanding DNN-learned notions of sub-concepts, concept confusion, and concept outliers. To allow exploration of learned concept distributions, we propose a novel local concept analysis framework. Instead of optimizing a single global concept vector on the complete dataset, it generates a local concept embedding (LoCE) vector for each individual sample. We use the distribution formed by LoCEs to explore the latent concept distribution by fitting Gaussian mixture models (GMMs), hierarchical clustering, and concept-level information retrieval and outlier detection. Despite its context sensitivity, our method's concept segmentation performance is competitive to global baselines. Analysis results are obtained on two datasets and five diverse vision DNN architectures, including vision transformers (ViTs).

翻译：对学习到的潜在表示进行深入理解，对于验证深度神经网络在关键计算机视觉任务中的可靠性至关重要。因此，最先进的基于概念的可解释人工智能监督方法将用户定义的概念（如“汽车”）分别与深度神经网络潜在空间中的一个向量（概念嵌入向量）相关联。在概念分割任务中，这些向量能够线性区分属于该概念的激活图像素与背景像素。然而，现有的概念分割方法未能充分捕捉子概念（例如“近处汽车”和“远处汽车”）以及概念重叠（例如“公交车”和“卡车”）。换言之，它们未能捕获潜在空间中概念表征的完整分布。本研究首次证明，这些简化假设经常被打破，且分布信息对于理解深度神经网络学习到的子概念、概念混淆及概念异常值特别有用。为支持对学习到的概念分布的探索，我们提出了一种新颖的局部概念分析框架。该方法不再针对整个数据集优化单一的全局概念向量，而是为每个独立样本生成一个局部概念嵌入向量。我们利用局部概念嵌入向量形成的分布，通过拟合高斯混合模型、层次聚类以及概念级信息检索和异常检测来探索潜在概念分布。尽管具有上下文敏感性，我们方法的概念分割性能仍与全局基线方法具有竞争力。分析结果基于两个数据集和五种不同的视觉深度神经网络架构获得，其中包括视觉Transformer。