This study aims to prove the emergence of symbolic concepts (or more precisely, sparse primitive inference patterns) in well-trained deep neural networks (DNNs). Specifically, we prove the following three conditions for the emergence. (i) The high-order derivatives of the network output with respect to the input variables are all zero. (ii) The DNN can be used on occluded samples and when the input sample is less occluded, the DNN will yield higher confidence. (iii) The confidence of the DNN does not significantly degrade on occluded samples. These conditions are quite common, and we prove that under these conditions, the DNN will only encode a relatively small number of sparse interactions between input variables. Moreover, we can consider such interactions as symbolic primitive inference patterns encoded by a DNN, because we show that inference scores of the DNN on an exponentially large number of randomly masked samples can always be well mimicked by numerical effects of just a few interactions.
翻译:本研究旨在证明训练良好的深度神经网络(DNNs)中符号概念(或更精确地说,稀疏的原始推理模式)的涌现。具体而言,我们证明了涌现发生的以下三个条件。(i)网络输出相对于输入变量的高阶导数均为零。(ii)DNN 可用于遮挡样本,且当输入样本遮挡较少时,DNN 将产生更高的置信度。(iii)DNN 在遮挡样本上的置信度不会显著下降。这些条件相当普遍,我们证明在这些条件下,DNN 仅会编码输入变量之间相对少量的稀疏交互。此外,我们可以将此类交互视为 DNN 编码的符号化原始推理模式,因为我们证明了 DNN 在指数级大量随机遮挡样本上的推理得分,总能被仅由少数交互产生的数值效应很好地模拟。