Knowledge Graphs (KGs) such as Resource Description Framework (RDF) data represent relationships between various entities through the structure of triples (<subject, predicate, object>). Knowledge graph embedding (KGE) is crucial in machine learning applications, specifically in node classification and link prediction tasks. KGE remains a vital research topic within the semantic web community. RDF-star introduces the concept of a quoted triple (QT), a specific form of triple employed either as the subject or object within another triple. Moreover, RDF-star permits a QT to act as compositional entities within another QT, thereby enabling the representation of recursive, hyper-relational KGs with nested structures. However, existing KGE models fail to adequately learn the semantics of QTs and entities, primarily because they do not account for RDF-star graphs containing multi-leveled nested QTs and QT-QT relationships. This study introduces RDF-star2Vec, a novel KGE model specifically designed for RDF-star graphs. RDF-star2Vec introduces graph walk techniques that enable probabilistic transitions between a QT and its compositional entities. Feature vectors for QTs, entities, and relations are derived from generated sequences through the structured skip-gram model. Additionally, we provide a dataset and a benchmarking framework for data mining tasks focused on complex RDF-star graphs. Evaluative experiments demonstrated that RDF-star2Vec yielded superior performance compared to recent extensions of RDF2Vec in various tasks including classification, clustering, entity relatedness, and QT similarity.
翻译:知识图谱(如资源描述框架RDF数据)通过三元组结构(<主语、谓语、宾语>)表示各类实体间的关联。知识图谱嵌入(KGE)在机器学习应用中至关重要,尤其在节点分类与链接预测任务中。KGE始终是语义网社区的核心研究课题。RDF-star引入引证三元组(QT)概念——一种特殊形式的三元组,可作为另一三元组的主语或宾语。此外,RDF-star允许QT作为构成性实体嵌套于其他QT中,从而支持具有嵌套结构的递归超关系知识图谱表示。然而,现有KGE模型未能有效学习QT及其语义,主要原因在于其未考虑包含多层嵌套QT及QT-QT关系的RDF-star图。本研究提出RDF-star2Vec——一种专门为RDF-star图设计的新型KGE模型。RDF-star2Vec引入图游走技术,实现QT与其构成性实体间的概率转移。通过结构化skip-gram模型从生成的序列中提取QT、实体及关系的特征向量。此外,我们提供了面向复杂RDF-star图数据挖掘任务的数据集与基准测试框架。实验评估表明,在分类、聚类、实体关联度及QT相似性等多项任务中,RDF-star2Vec的性能均优于最新扩展的RDF2Vec模型。