Structure learning is a core problem in AI central to the fields of neuro-symbolic AI and statistical relational learning. It consists in automatically learning a logical theory from data. The basis for structure learning is mining repeating patterns in the data, known as structural motifs. Finding these patterns reduces the exponential search space and therefore guides the learning of formulas. Despite the importance of motif learning, it is still not well understood. We present the first principled approach for mining structural motifs in lifted graphical models, languages that blend first-order logic with probabilistic models, which uses a stochastic process to measure the similarity of entities in the data. Our first contribution is an algorithm, which depends on two intuitive hyperparameters: one controlling the uncertainty in the entity similarity measure, and one controlling the softness of the resulting rules. Our second contribution is a preprocessing step where we perform hierarchical clustering on the data to reduce the search space to the most relevant data. Our third contribution is to introduce an O(n ln n) (in the size of the entities in the data) algorithm for clustering structurally-related data. We evaluate our approach using standard benchmarks and show that we outperform state-of-the-art structure learning approaches by up to 6% in terms of accuracy and up to 80% in terms of runtime.
翻译:结构学习是人工智能中的核心问题,对神经符号人工智能和统计关系学习领域至关重要。它旨在从数据中自动学习逻辑理论。结构学习的基础是挖掘数据中的重复模式,即结构模体(structural motifs)。发现这些模式能够缩小指数级搜索空间,从而指导公式的学习。尽管模体学习具有重要意义,但对其理解仍不充分。我们提出了首个用于挖掘提升图模型(一种将一阶逻辑与概率模型融合的语言)中结构模体的有原则方法,该方法利用随机过程来测量数据中实体的相似性。我们的第一个贡献是一种算法,该算法依赖于两个直观的超参数:一个控制实体相似性度量中的不确定性,另一个控制生成规则的软性程度。第二个贡献是一个预处理步骤,通过对数据进行层次聚类将搜索空间缩减至最相关的数据。第三个贡献是引入了一种复杂度为O(n ln n)(n为数据中实体数量)的算法,用于对结构相关数据进行聚类。我们使用标准基准评估了该方法,结果表明在精度上比最先进的结构学习方法提升了最多6%,在运行时间上提升了最多80%。