Knowledge Graphs, such as Wikidata, comprise structural and textual knowledge in order to represent knowledge. For each of the two modalities dedicated approaches for graph embedding and language models learn patterns that allow for predicting novel structural knowledge. Few approaches have integrated learning and inference with both modalities and these existing ones could only partially exploit the interaction of structural and textual knowledge. In our approach, we build on existing strong representations of single modalities and we use hypercomplex algebra to represent both, (i), single-modality embedding as well as, (ii), the interaction between different modalities and their complementary means of knowledge representation. More specifically, we suggest Dihedron and Quaternion representations of 4D hypercomplex numbers to integrate four modalities namely structural knowledge graph embedding, word-level representations (e.g.\ Word2vec, Fasttext), sentence-level representations (Sentence transformer), and document-level representations (sentence transformer, Doc2vec). Our unified vector representation scores the plausibility of labelled edges via Hamilton and Dihedron products, thus modeling pairwise interactions between different modalities. Extensive experimental evaluation on standard benchmark datasets shows the superiority of our two new models using abundant textual information besides sparse structural knowledge to enhance performance in link prediction tasks.
翻译:知识图谱(如Wikidata)通过结构化知识和文本知识共同表征信息。针对这两种模态,图嵌入和语言模型的专用方法可学习用于预测新型结构化知识的模式。现有少数方法尝试融合两种模态的学习与推理,但仅能部分利用结构化知识与文本知识的交互作用。本研究基于现有单模态的强表征能力,采用超复数代数同时实现:(i)单模态嵌入,以及(ii)不同模态及其互补知识表征方式之间的交互。具体而言,我们提出使用二面体数和四元数表征四维超复数,以整合四种模态:结构化知识图谱嵌入、词级表征(如Word2vec、Fasttext)、句级表征(Sentence transformer)和文档级表征(Sentence transformer、Doc2vec)。通过哈密顿积和二面体积计算标签化边的合理性得分,统一向量表征由此建模不同模态间的成对交互。在标准基准数据集上的广泛实验表明,我们的两种新模型在利用稀疏结构化知识之外,通过丰富文本信息显著提升了链路预测任务的性能。