Large language models as oracles for instantiating ontologies with domain-specific knowledge

Background. Endowing intelligent systems with semantic data commonly requires designing and instantiating ontologies with domain-specific knowledge. Especially in the early phases, those activities are typically performed manually by human experts possibly leveraging on their own experience. The resulting process is therefore time-consuming, error-prone, and often biased by the personal background of the ontology designer. Objective. To mitigate that issue, we propose a novel domain-independent approach to automatically instantiate ontologies with domain-specific knowledge, by leveraging on large language models (LLMs) as oracles. Method. Starting from (i) an initial schema composed by inter-related classes and properties and (ii) a set of query templates, our method queries the LLM multiple times, and generates instances for both classes and properties from its replies. Thus, the ontology is automatically filled with domain-specific knowledge, compliant to the initial schema. As a result, the ontology is quickly and automatically enriched with manifold instances, which experts may consider to keep, adjust, discard, or complement according to their own needs and expertise. Contribution. We formalise our method in general way and instantiate it over various LLMs, as well as on a concrete case study. We report experiments rooted in the nutritional domain where an ontology of food meals and their ingredients is automatically instantiated from scratch, starting from a categorisation of meals and their relationships. There, we analyse the quality of the generated ontologies and compare ontologies attained by exploiting different LLMs. Experimentally, our approach achieves a quality metric that is up to five times higher than the state-of-the-art, while reducing erroneous entities and relations by up to ten times. Finally, we provide a SWOT analysis of the proposed method.

翻译：背景。为智能系统赋予语义数据通常需要设计并利用领域特定知识实例化本体。尤其在早期阶段，这些活动通常由人类专家手动完成，可能依赖于其自身经验。因此，该过程耗时、易错，且常受本体设计者个人背景的影响。目标。为缓解此问题，我们提出一种新颖的领域无关方法，通过利用大型语言模型作为预言者，自动用领域特定知识实例化本体。方法。从（i）由相互关联的类和属性组成的初始模式，以及（ii）一组查询模板出发，我们的方法多次查询大型语言模型，并从其回复中生成类和属性的实例。因此，本体被自动填充符合初始模式的领域特定知识。结果，本体得以快速自动地丰富为多样化的实例，专家可根据自身需求和专业知识选择保留、调整、舍弃或补充这些实例。贡献。我们以通用形式形式化该方法，并在多种大型语言模型及具体案例研究中进行了实例化。我们报告了基于营养学领域的实验，其中从膳食分类及其关系出发，自动从零开始实例化了一个包含膳食及其成分的本体。在此，我们分析了生成本体的质量，并比较了利用不同大型语言模型获得的本体。实验表明，我们的方法达到的质量指标比现有最优方法高出五倍，同时将错误实体和关系的数量减少至十分之一。最后，我们对所提方法进行了SWOT分析。