Recently, a series of studies have tried to extract interactions between input variables modeled by a DNN and define such interactions as concepts encoded by the DNN. However, strictly speaking, there still lacks a solid guarantee whether such interactions indeed represent meaningful concepts. Therefore, in this paper, we examine the trustworthiness of interaction concepts from four perspectives. Extensive empirical studies have verified that a well-trained DNN usually encodes sparse, transferable, and discriminative concepts, which is partially aligned with human intuition.
翻译:近期,一系列研究试图提取深度神经网络(DNN)建模的输入变量之间的交互作用,并将此类交互作用定义为DNN所编码的概念。然而,严格来说,此类交互作用是否确实代表有意义的概念仍缺乏可靠保障。因此,本文从四个维度审视了交互概念的可信度。大量实证研究表明,经过充分训练的DNN通常编码出稀疏、可迁移且具有判别性的概念,这与人类直觉存在部分一致性。