Many applications require explainable node classification in knowledge graphs. Towards this end, a popular ``white-box'' approach is class expression learning: Given sets of positive and negative nodes, class expressions in description logics are learned that separate positive from negative nodes. Most existing approaches are search-based approaches generating many candidate class expressions and selecting the best one. However, they often take a long time to find suitable class expressions. In this paper, we cast class expression learning as a translation problem and propose a new family of class expression learning approaches which we dub neural class expression synthesizers. Training examples are ``translated'' into class expressions in a fashion akin to machine translation. Consequently, our synthesizers are not subject to the runtime limitations of search-based approaches. We study three instances of this novel family of approaches based on LSTMs, GRUs, and set transformers, respectively. An evaluation of our approach on four benchmark datasets suggests that it can effectively synthesize high-quality class expressions with respect to the input examples in approximately one second on average. Moreover, a comparison to state-of-the-art approaches suggests that we achieve better F-measures on large datasets. For reproducibility purposes, we provide our implementation as well as pretrained models in our public GitHub repository at https://github.com/dice-group/NeuralClassExpressionSynthesis
翻译:许多应用需要在知识图谱中进行可解释的节点分类。为此,一种流行的“白盒”方法是类表达式学习:给定正例和负例节点集,学习描述逻辑中的类表达式以区分正例与负例节点。现有方法大多基于搜索策略,生成大量候选类表达式并从中择优。然而,这些方法通常需要较长时间才能找到合适的类表达式。本文将类表达式学习建模为翻译问题,并提出一类新的类表达式学习方法——我们称之为神经类表达式合成器。训练样本以类似于机器翻译的方式被“翻译”为类表达式。因此,我们的合成器不受搜索方法运行时限制的影响。我们研究了该新方法家族的三个实例,分别基于LSTM、GRU和集合变换器。在四个基准数据集上的评估表明,我们的方法能有效合成与输入样本相关的高质量类表达式,平均耗时约一秒。此外,与最新方法的比较显示,我们在大型数据集上获得了更优的F值。为保证可复现性,我们在公共GitHub仓库https://github.com/dice-group/NeuralClassExpressionSynthesis 中提供了实现代码及预训练模型。