Concept Bottleneck Models (CBMs) aim to improve interpretability in Deep Learning by structuring predictions through human-understandable concepts, but they provide no way to verify whether learned concepts align with the human's intended meaning, hurting interpretability. We introduce Prototype-Grounded Concept Models (PGCMs), which ground concepts in learned visual prototypes: image parts that serve as explicit evidence for the concepts. This grounding enables direct inspection of concept semantics and supports targeted human intervention at the prototype level to correct misalignments. Empirically, PGCMs match the predictive performance of state-of-the-art CBMs while substantially improving transparency, interpretability, and intervenability.
翻译:概念瓶颈模型旨在通过人类可理解的概念构建预测结构来提升深度学习中的可解释性,但无法验证所学概念是否与人类的预期语义对齐,从而损害了可解释性。我们提出原型锚定概念模型,将概念锚定在学习到的视觉原型上:作为概念显式证据的图像片段。这种锚定机制支持对概念语义的直接检查,并允许在原型层面进行针对性的人工干预以纠正对齐偏差。实验表明,本模型在匹配最先进概念瓶颈模型预测性能的同时,显著提升了透明度、可解释性与可干预性。