Pre-trained language models (LMs) have made significant advances in various Natural Language Processing (NLP) domains, but it is unclear to what extent they can infer formal semantics in ontologies, which are often used to represent conceptual knowledge and serve as the schema of data graphs. To investigate an LM's knowledge of ontologies, we propose OntoLAMA, a set of inference-based probing tasks and datasets from ontology subsumption axioms involving both atomic and complex concepts. We conduct extensive experiments on ontologies of different domains and scales, and our results demonstrate that LMs encode relatively less background knowledge of Subsumption Inference (SI) than traditional Natural Language Inference (NLI) but can improve on SI significantly when a small number of samples are given. We will open-source our code and datasets.
翻译:预训练语言模型在自然语言处理(NLP)多个领域取得了显著进展,但目前尚不明确它们在多大程度上能够推断本体中的形式化语义——本体常用于表示概念知识并作为数据图模式的骨架。为探究语言模型对本体的认知程度,我们提出OntoLAMA:一组基于包含公理的推理探测任务与数据集,涵盖原子概念与复杂概念。我们在不同领域和规模的本体上开展了大量实验,结果表明:与传统的自然语言推理(NLI)相比,语言模型对包含推理(SI)相关背景知识的编码相对较少,但在提供少量样本后,其SI性能可获得显著提升。我们将开源代码与数据集。