Bridging the huge disparity between neural and symbolic representation can potentially enable the incorporation of symbolic thinking into neural networks from essence. Motivated by how human gradually builds complex symbolic representation from the prototype symbols that are learned through perception and environmental interactions. We propose a Neural-Symbolic Transitional Dictionary Learning (TDL) framework that employs an EM algorithm to learn a transitional representation of data that compresses high-dimension information of visual parts of an input into a set of tensors as neural variables and discover the implicit predicate structure in a self-supervised way. We implement the framework with a diffusion model by regarding the decomposition of input as a cooperative game, then learn predicates by prototype clustering. We additionally use RL enabled by the Markovian of diffusion models to further tune the learned prototypes by incorporating subjective factors. Extensive experiments on 3 abstract compositional visual objects datasets that require the model to segment parts without any visual features like texture, color, or shadows apart from shape and 3 neural/symbolic downstream tasks demonstrate the learned representation enables interpretable decomposition of visual input and smooth adaption to downstream tasks which are not available by existing methods.
翻译:弥合神经表征与符号表征之间的巨大差异,有望从根本上将符号思维融入神经网络。受人类如何通过感知和环境交互学习原型符号,并逐步构建复杂符号表征的过程启发,我们提出了一种神经-符号过渡式字典学习(TDL)框架。该框架采用EM算法学习数据的过渡式表征,将输入视觉部分的高维信息压缩为一组张量(作为神经变量),并以自监督方式发现隐含的谓词结构。我们通过将输入分解视为协作博弈,利用扩散模型实现该框架,并基于原型聚类学习谓词。此外,我们借助扩散模型的马尔可夫性启用强化学习,通过融入主观因素进一步优化所学原型。在三个需要模型仅凭形状(无纹理、颜色或阴影等视觉特征)进行部件分割的抽象组合视觉对象数据集,以及三个神经/符号下游任务上的大量实验表明,所学表征能够实现视觉输入的可解释分解,并平滑适应现有方法无法实现的下游任务。