Generating vector representations (embeddings) of OWL ontologies is a growing task due to its applications in predicting missing facts and knowledge-enhanced learning in fields such as bioinformatics. The underlying semantics of OWL ontologies are expressed using Description Logics (DLs). Initial approaches to generate embeddings relied on constructing a graph out of ontologies, neglecting the semantics of the logic therein. Recent semantic-preserving embedding methods often target lightweight DL languages like $\mathcal{EL}^{++}$, ignoring more expressive information in ontologies. Although some approaches aim to embed more descriptive DLs like $\mathcal{ALC}$, those methods require the existence of individuals, while many real-world ontologies are devoid of them. We propose an ontology embedding method for the $\mathcal{ALC}$ DL language that considers the lattice structure of concept descriptions. We use connections between DL and Category Theory to materialize the lattice structure and embed it using an order-preserving embedding method. We show that our method outperforms state-of-the-art methods in several knowledge base completion tasks. Furthermore, we incoporate saturation procedures that increase the information within the constructed lattices. We make our code and data available at \url{https://github.com/bio-ontology-research-group/catE}.
翻译:生成OWL本体的向量表示(嵌入)因其在生物信息学等领域中预测缺失事实和知识增强学习方面的应用,正日益成为一项重要任务。OWL本体的基础语义通过描述逻辑(DLs)表达。早期的嵌入生成方法依赖于将本体构建为图结构,忽略了其中逻辑的语义特性。近期保持语义的嵌入方法通常针对轻量级描述逻辑语言(如$\mathcal{EL}^{++}$),忽略了本体中更具表达力的信息。尽管某些方法旨在嵌入更具描述性的描述逻辑(如$\mathcal{ALC}$),但这些方法要求存在个体实例,而许多现实世界的本体并不包含个体。我们提出了一种针对$\mathcal{ALC}$描述逻辑语言的本体嵌入方法,该方法考虑了概念描述的格结构。我们利用描述逻辑与范畴论之间的关联来实现格结构,并通过保序嵌入方法进行向量化。实验表明,在多项知识库补全任务中,我们的方法优于现有最先进方法。此外,我们引入了饱和化程序以增强所构建格结构中的信息量。相关代码和数据已公开于\url{https://github.com/bio-ontology-research-group/catE}。