This paper aims to prove the emergence of symbolic concepts in well-trained AI models. We prove that if (1) the high-order derivatives of the model output w.r.t. the input variables are all zero, (2) the AI model can be used on occluded samples and will yield higher confidence when the input sample is less occluded, and (3) the confidence of the AI model does not significantly degrade on occluded samples, then the AI model will encode sparse interactive concepts. Each interactive concept represents an interaction between a specific set of input variables, and has a certain numerical effect on the inference score of the model. Specifically, it is proved that the inference score of the model can always be represented as the sum of the interaction effects of all interactive concepts. In fact, we hope to prove that conditions for the emergence of symbolic concepts are quite common. It means that for most AI models, we can usually use a small number of interactive concepts to mimic the model outputs on any arbitrarily masked samples.
翻译:本文旨在证明在训练良好的AI模型中符号概念的涌现性。我们证明:若(1)模型输出关于输入变量的高阶导数均为零;(2)AI模型可用于遮挡样本,且输入样本遮挡程度越低,模型输出置信度越高;(3)AI模型对遮挡样本的置信度不会显著下降,则该AI模型将编码稀疏交互概念。每个交互概念表示特定输入变量集合间的交互作用,并对模型推理得分具有确定的数值影响。具体而言,我们证明了模型的推理得分总可表示为所有交互概念交互效应的总和。事实上,我们希望证明符号概念涌现的条件具有普适性,这意味着对于大多数AI模型,我们通常能够用少量交互概念来模拟模型在任意遮蔽样本上的输出。