The increasing volume of scholarly publications requires advanced tools for efficient knowledge discovery and management. This paper introduces ongoing work on a system using Large Language Models (LLMs) for the semantic extraction of key concepts from scientific documents. Our research, conducted within the German National Research Data Infrastructure for and with Computer Science (NFDIxCS) project, seeks to support FAIR (Findable, Accessible, Interoperable, and Reusable) principles in scientific publishing. We outline our explorative work, which uses in-context learning with various LLMs to extract concepts from papers, initially focusing on the Business Process Management (BPM) domain. A key advantage of this approach is its potential for rapid domain adaptation, often requiring few or even zero examples to define extraction targets for new scientific fields. We conducted technical evaluations to compare the performance of commercial and open-source LLMs and created an online demo application to collect feedback from an initial user-study. Additionally, we gathered insights from the computer science research community through user stories collected during a dedicated workshop, actively guiding the ongoing development of our future services. These services aim to support structured literature reviews, concept-based information retrieval, and integration of extracted knowledge into existing knowledge graphs.
翻译:学术出版物数量的日益增长,要求采用先进工具以实现高效的知识发现与管理。本文介绍了一项利用大语言模型从科学文献中语义抽取关键概念的系统的持续研究工作。本研究在德国国家计算机科学研究数据基础设施项目中开展,旨在支持科学出版中FAIR(可发现、可访问、可互操作、可重用)原则的实施。我们概述了探索性工作,该方法通过上下文学习利用多种大语言模型从论文中抽取概念,并初步聚焦于业务流程管理领域。该方法的一个关键优势在于其快速领域适应的潜力,通常仅需少量甚至零示例即可为新科学领域定义抽取目标。我们进行了技术评估以比较商业与开源大语言模型的性能,并创建了在线演示应用程序以收集初步用户研究的反馈。此外,我们通过专题研讨会收集的用户故事,获取了计算机科学研究社区的见解,积极指导未来服务的持续开发。这些服务旨在支持结构化文献综述、基于概念的信息检索,以及将抽取的知识整合到现有知识图谱中。