This paper describes our participation in the Shared Task on Software Mentions Disambiguation (SOMD), with a focus on improving relation extraction in scholarly texts through Generative Language Models (GLMs) using single-choice question-answering. The methodology prioritises the use of in-context learning capabilities of GLMs to extract software-related entities and their descriptive attributes, such as distributive information. Our approach uses Retrieval-Augmented Generation (RAG) techniques and GLMs for Named Entity Recognition (NER) and Attributive NER to identify relationships between extracted software entities, providing a structured solution for analysing software citations in academic literature. The paper provides a detailed description of our approach, demonstrating how using GLMs in a single-choice QA paradigm can greatly enhance IE methodologies. Our participation in the SOMD shared task highlights the importance of precise software citation practices and showcases our system's ability to overcome the challenges of disambiguating and extracting relationships between software mentions. This sets the groundwork for future research and development in this field.
翻译:本文描述了我们在软件提及消歧(SOMD)共享任务中的参与情况,重点研究通过生成式语言模型(GLMs)结合单选择题问答方法改进学术文本中的关系抽取。该方法优先利用GLMs的上下文学习能力提取软件相关实体及其描述性属性(如分发信息)。我们的方案采用检索增强生成(RAG)技术与GLMs进行命名实体识别(NER)和属性性NER,以识别所提取软件实体之间的关系,为分析学术文献中的软件引用提供了结构化解决方案。本文详细阐述了我们的方法,展示了在单选择题问答范式下使用GLMs如何显著增强信息抽取(IE)方法论。我们在SOMD共享任务中的参与突显了精确软件引用实践的重要性,并展示了系统在消歧及抽取软件提及之间关系方面克服挑战的能力,为本领域的未来研究与发展奠定了基础。