The recent mass adoption of DNNs, even in safety-critical scenarios, has shifted the focus of the research community towards the creation of inherently intrepretable models. Concept Bottleneck Models (CBMs) constitute a popular approach where hidden layers are tied to human understandable concepts allowing for investigation and correction of the network's decisions. However, CBMs usually suffer from: (i) performance degradation and (ii) lower interpretability than intended due to the sheer amount of concepts contributing to each decision. In this work, we propose a simple yet highly intuitive interpretable framework based on Contrastive Language Image models and a single sparse linear layer. In stark contrast to related approaches, the sparsity in our framework is achieved via principled Bayesian arguments by inferring concept presence via a data-driven Bernoulli distribution. As we experimentally show, our framework not only outperforms recent CBM approaches accuracy-wise, but it also yields high per example concept sparsity, facilitating the individual investigation of the emerging concepts.
翻译:近年来,深度神经网络(DNN)的大规模普及,即便在安全关键场景中,也促使研究社区将重心转向构建可内在解释的模型。概念瓶颈模型(CBM)作为一种流行方法,将隐层与人类可理解的概念绑定,从而允许对网络决策进行探究和修正。然而,CBM通常面临两个问题:(i) 性能下降;(ii) 因每个决策涉及的概念数量过多而导致可解释性低于预期。在本工作中,我们提出一种基于对比语言-图像模型与单一稀疏线性层的简洁且高度直观的可解释框架。与相关方法截然不同,该框架的稀疏性通过贝叶斯理论依据实现——利用数据驱动的伯努利分布推断概念存在性。实验表明,我们的框架不仅在准确率上优于近期CBM方法,还能为每个样本提供较高的概念稀疏性,从而便于对生成的概念进行单独研究。