Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field

Ontologies of research topics are crucial for structuring scientific knowledge, enabling scientists to navigate vast amounts of research, and forming the backbone of intelligent systems such as search engines and recommendation systems. However, manual creation of these ontologies is expensive, slow, and often results in outdated and overly general representations. As a solution, researchers have been investigating ways to automate or semi-automate the process of generating these ontologies. This paper offers a comprehensive analysis of the ability of large language models (LLMs) to identify semantic relationships between different research topics, which is a critical step in the development of such ontologies. To this end, we developed a gold standard based on the IEEE Thesaurus to evaluate the task of identifying four types of relationships between pairs of topics: broader, narrower, same-as, and other. Our study evaluates the performance of seventeen LLMs, which differ in scale, accessibility (open vs. proprietary), and model type (full vs. quantised), while also assessing four zero-shot reasoning strategies. Several models have achieved outstanding results, including Mixtral-8x7B, Dolphin-Mistral-7B, and Claude 3 Sonnet, with F1-scores of 0.847, 0.920, and 0.967, respectively. Furthermore, our findings demonstrate that smaller, quantised models, when optimised through prompt engineering, can deliver performance comparable to much larger proprietary models, while requiring significantly fewer computational resources.

翻译：研究主题本体对于构建科学知识体系至关重要，它使科学家能够在海量研究中导航，并构成搜索引擎和推荐系统等智能系统的核心支撑。然而，人工创建这些本体成本高昂、速度缓慢，且往往导致过时和过于笼统的表征。作为解决方案，研究人员一直在探索如何自动化或半自动化生成这些本体的过程。本文全面分析了大语言模型（LLMs）识别不同研究主题之间语义关系的能力，这是开发此类本体的关键步骤。为此，我们基于IEEE叙词表构建了黄金标准，用于评估识别主题对之间四种关系类型的任务：上位关系、下位关系、等同关系和其他关系。本研究评估了十七种LLMs的性能，这些模型在规模、可访问性（开源与专有）和模型类型（完整版与量化版）上各不相同，同时评估了四种零样本推理策略。多个模型取得了优异结果，包括Mixtral-8x7B、Dolphin-Mistral-7B和Claude 3 Sonnet，其F1分数分别达到0.847、0.920和0.967。此外，我们的研究结果表明，通过提示工程优化后，较小的量化模型能够提供与大型专有模型相媲美的性能，同时所需计算资源显著减少。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/