Neural-symbolic approaches to machine learning incorporate the advantages from both connectionist and symbolic methods. Typically, these models employ a first module based on a neural architecture to extract features from complex data. Then, these features are processed as symbols by a symbolic engine that provides reasoning, concept structures, composability, better generalization and out-of-distribution learning among other possibilities. However, neural approaches to the grounding of symbols in sensory data, albeit powerful, still require heavy training and tedious labeling for the most part. This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex spatial sensory data. The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept. Following his suggestion, the model extracts atomic features from raw data by computing elemental sequential comparisons in a stream of multivariate numerical values. Higher-level constructs are built from these features by subjecting them to further comparisons in a recursive process. At any stage in the recursion, a concept structure may be obtained from these constructs and features by means of Formal Concept Analysis. Results show that the model is able to produce fairly rich yet human-readable conceptual representations without training. Additionally, the concept structures obtained through the model (i) present high composability, which potentially enables the generation of 'unseen' concepts, (ii) allow formal reasoning, and (iii) have inherent abilities for generalization and out-of-distribution learning. Consequently, this method may offer an interesting angle to current neural-symbolic research. Future work is required to develop a training methodology so that the model can be tested against a larger dataset.
翻译:神经符号主义机器学习方法融合了联结主义方法和符号主义方法的优势。这类模型通常采用基于神经架构的第一模块从复杂数据中提取特征,随后这些特征由符号引擎作为符号进行加工处理,从而提供推理能力、概念结构、可组合性、更好的泛化性能以及分布外学习等可能性。然而,当前符号在感知数据中的基础化神经方法尽管强大,但大多仍需要密集的训练和繁琐的标注工作。本文提出了一种纯符号方法,用于从复杂空间感知数据中生成层次化概念结构。该方法基于贝特森关于"差异"是观念或概念生成关键要素的核心理念。遵循这一思路,该模型通过在多变量数值数据流中计算基本时序比较,从原始数据中提取原子特征。通过对这些特征进行递归比较,构建更高层次的结构。在递归的任何阶段,都可以通过形式概念分析从这些构建体和特征中获取概念结构。实验结果表明,该模型无需训练即可生成相当丰富且可读性强的概念表征。此外,通过该模型获得的概念结构(i)具有高可组合性,可潜在地生成"未见"概念;(ii)支持形式化推理;(iii)具备内在的泛化和分布外学习能力。因此,该方法可能为当前的神经符号研究提供有趣的新视角。未来需开发训练方法,使模型能够在更大数据集上进行测试。