Deep learning methods are highly accurate, yet their opaque decision process prevents them from earning full human trust. Concept-based models aim to address this issue by learning tasks based on a set of human-understandable concepts. However, state-of-the-art concept-based models rely on high-dimensional concept embedding representations which lack a clear semantic meaning, thus questioning the interpretability of their decision process. To overcome this limitation, we propose the Deep Concept Reasoner (DCR), the first interpretable concept-based model that builds upon concept embeddings. In DCR, neural networks do not make task predictions directly, but they build syntactic rule structures using concept embeddings. DCR then executes these rules on meaningful concept truth degrees to provide a final interpretable and semantically-consistent prediction in a differentiable manner. Our experiments show that DCR: (i) improves up to +25% w.r.t. state-of-the-art interpretable concept-based models on challenging benchmarks (ii) discovers meaningful logic rules matching known ground truths even in the absence of concept supervision during training, and (iii), facilitates the generation of counterfactual examples providing the learnt rules as guidance.
翻译:深度学习方法精度很高,但其不透明的决策过程阻碍了获得人类的完全信任。基于概念的模型旨在通过一组人类可理解的概念来学习任务,从而解决这一问题。然而,最先进的基于概念的模型依赖于高维的概念嵌入表示,这些表示缺乏明确的语义含义,因此其决策过程的可解释性受到质疑。为了克服这一局限,我们提出了深度概念推理器(Deep Concept Reasoner, DCR),这是首个建立在概念嵌入之上的可解释概念模型。在DCR中,神经网络不直接进行任务预测,而是使用概念嵌入构建语法规则结构。随后,DCR以可微分的方式,在这些有意义的真值度上执行这些规则,以提供最终可解释且语义一致的预测。我们的实验表明,DCR:(i)在具有挑战性的基准测试中,相对于最先进的可解释概念模型,性能提升高达25%;(ii)即使在没有概念监督的情况下进行训练,也能发现与已知基本事实相匹配的有意义的逻辑规则;(iii)促进了反事实样本的生成,并以学习到的规则作为指导。