Concept-based explainability methods provide insight into deep learning systems by constructing explanations using human-understandable concepts. While the literature on human reasoning demonstrates that we exploit relationships between concepts when solving tasks, it is unclear whether concept-based methods incorporate the rich structure of inter-concept relationships. We analyse the concept representations learnt by concept-based models to understand whether these models correctly capture inter-concept relationships. First, we empirically demonstrate that state-of-the-art concept-based models produce representations that lack stability and robustness, and such methods fail to capture inter-concept relationships. Then, we develop a novel algorithm which leverages inter-concept relationships to improve concept intervention accuracy, demonstrating how correctly capturing inter-concept relationships can improve downstream tasks.
翻译:基于概念的可解释性方法通过使用人类可理解的概念构建解释,为深度学习系统提供洞察。尽管人类推理的相关文献表明我们在解决任务时会利用概念间的关系,但目前尚不清楚基于概念的方法是否融入了丰富的概念间关系结构。我们分析了基于概念模型所学习到的概念表示,以理解这些模型是否能正确捕捉概念间关系。首先,我们通过实证研究表明,当前最先进的基于概念模型产生的表示缺乏稳定性和鲁棒性,且此类方法未能有效捕捉概念间关系。随后,我们开发了一种新颖算法,该算法利用概念间关系来提升概念干预的准确性,从而证明正确捕捉概念间关系能够有效改善下游任务。