Structure learning is a core problem in AI central to the fields of neuro-symbolic AI and statistical relational learning. It consists in automatically learning a logical theory from data. The basis for structure learning is mining repeating patterns in the data, known as structural motifs. Finding these patterns reduces the exponential search space and therefore guides the learning of formulas. Despite the importance of motif learning, it is still not well understood. We present the first principled approach for mining structural motifs in lifted graphical models, languages that blend first-order logic with probabilistic models, which uses a stochastic process to measure the similarity of entities in the data. Our first contribution is an algorithm, which depends on two intuitive hyperparameters: one controlling the uncertainty in the entity similarity measure, and one controlling the softness of the resulting rules. Our second contribution is a preprocessing step where we perform hierarchical clustering on the data to reduce the search space to the most relevant data. Our third contribution is to introduce an O(n ln n) (in the size of the entities in the data) algorithm for clustering structurally-related data. We evaluate our approach using standard benchmarks and show that we outperform state-of-the-art structure learning approaches by up to 6% in terms of accuracy and up to 80% in terms of runtime.
翻译:结构学习是人工智能中的核心问题,在神经符号AI和统计关系学习领域具有重要地位。其目标是从数据中自动学习逻辑理论。结构学习的基础是挖掘数据中重复出现的模式,即结构基序。发现这些模式可缩减指数级搜索空间,从而指导公式的学习。尽管模式学习至关重要,但该问题尚未得到充分理解。本文首次提出在提升图模型(融合一阶逻辑与概率模型的语言)中挖掘结构基序的原理性方法,该方法利用随机过程度量数据中实体的相似性。我们的第一个贡献是一种算法,该算法依赖两个直观的超参数:一个控制实体相似性度量的不确定性,另一个控制生成规则的软度。第二个贡献是预处理步骤,通过对数据执行层次聚类将搜索空间缩减至最相关数据。第三个贡献是引入了一种O(n ln n)(n为数据中实体规模)的聚类算法,用于对结构关联数据进行分组。我们使用标准基准评估该方法,结果表明,在准确率上我们比现有最先进的结构学习方法提升高达6%,在运行时间上提升高达80%。