Interpretability and explainability of neural networks is continuously increasing in importance, especially within safety-critical domains and to provide the social right to explanation. Concept based explanations align well with how humans reason, proving to be a good way to explain models. Concept Embedding Models (CEMs) are one such concept based explanation architectures. These have shown to overcome the trade-off between explainability and performance. However, they have a key limitation -- they require concept annotations for all their training data. For large datasets, this can be expensive and infeasible. Motivated by this, we propose Automatic Concept Embedding Models (ACEMs), which learn the concept annotations automatically.
翻译:神经网络的可解释性与可说明性日益重要,尤其是在安全关键领域及为满足社会“解释权”需求方面。基于概念的解释方法与人类推理方式高度契合,已成为解释模型的有效途径。概念嵌入模型(CEMs)正是此类基于概念的解释架构之一,已被证明能够克服可解释性与性能之间的权衡问题。然而,这类模型存在关键局限——所有训练数据均需进行概念标注。对于大规模数据集而言,这一过程代价高昂且难以实现。受此启发,我们提出自动概念嵌入模型(ACEMs),该模型能够自动学习概念标注。