Concept Bottleneck Models (CBMs) aim to improve interpretability in Deep Learning by structuring predictions through human-understandable concepts, but they provide no way to verify whether learned concepts align with the human's intended meaning, hurting interpretability. We introduce Prototype-Grounded Concept Models (PGCMs), which ground concepts in learned visual prototypes: image parts that serve as explicit evidence for the concepts. This grounding enables direct inspection of concept semantics and supports targeted human intervention at the prototype level to correct misalignments. Empirically, PGCMs achieve similar predictive performance as state-of-the-art CBMs while substantially improving transparency, interpretability, and intervenability.
翻译:概念瓶颈模型(CBM)旨在通过可理解的人类概念构建预测结构来提升深度学习的可解释性,但这类方法无法验证所学概念是否与人类意图一致,从而削弱了解释效果。我们提出原型锚定概念模型(PGCM),通过可学习的视觉原型(作为概念显式证据的图像片段)锚定概念。这种锚定机制支持概念语义的直接检查,并允许在原型层面进行针对性的人工干预以修正对齐偏差。实验表明,PGCM在保持与最优CBM相当的预测性能的同时,显著提升了透明度、可解释性及可干预性。