Concept Bottleneck Models (CBMs) have garnered increasing attention due to their ability to provide concept-based explanations for black-box deep learning models while achieving high final prediction accuracy using human-like concepts. However, the training of current CBMs heavily relies on the accuracy and richness of annotated concepts in the dataset. These concept labels are typically provided by experts, which can be costly and require significant resources and effort. Additionally, concept saliency maps frequently misalign with input saliency maps, causing concept predictions to correspond to irrelevant input features - an issue related to annotation alignment. To address these limitations, we propose a new framework called SSCBM (Semi-supervised Concept Bottleneck Model). Our SSCBM is suitable for practical situations where annotated data is scarce. By leveraging joint training on both labeled and unlabeled data and aligning the unlabeled data at the concept level, we effectively solve these issues. We proposed a strategy to generate pseudo labels and an alignment loss. Experiments demonstrate that our SSCBM is both effective and efficient. With only 20% labeled data, we achieved 93.19% (96.39% in a fully supervised setting) concept accuracy and 75.51% (79.82% in a fully supervised setting) prediction accuracy.
翻译:概念瓶颈模型(CBMs)因其能够为黑盒深度学习模型提供基于概念的解释,同时利用类人概念实现较高的最终预测精度而受到越来越多的关注。然而,当前CBMs的训练严重依赖于数据集中标注概念的准确性和丰富性。这些概念标签通常由专家提供,成本高昂且需要大量资源和精力。此外,概念显著性图经常与输入显著性图不匹配,导致概念预测对应于不相关的输入特征——这是一个与标注对齐相关的问题。为了解决这些局限性,我们提出了一个名为SSCBM(半监督概念瓶颈模型)的新框架。我们的SSCBM适用于标注数据稀缺的实际场景。通过利用对标注和未标注数据的联合训练,并在概念层面对齐未标注数据,我们有效地解决了这些问题。我们提出了一种生成伪标签的策略和一种对齐损失。实验表明,我们的SSCBM既有效又高效。仅使用20%的标注数据,我们实现了93.19%(全监督设置下为96.39%)的概念准确率和75.51%(全监督设置下为79.82%)的预测准确率。