Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.
翻译:知识库和本体的构建是一项耗时且依赖人工整理的任务。AI/NLP方法可以协助专家整理者填充这些知识库,但现有方法依赖于大量训练数据,且无法填充任意复杂的嵌套知识模式。本文提出结构化提示查询与递归语义提取(SPIRES),这是一种利用大型语言模型(LLM)从灵活提示中执行零样本学习(ZSL)和通用查询应答能力的知识抽取方法,可返回符合指定模式的信息。给定用户定义的详细知识模式和输入文本,SPIRES递归地对GPT-3+进行提示查询,以获取与所提供模式匹配的一组响应。SPIRES利用现有本体和词汇表为所有匹配元素提供标识符。我们展示了SPIRES在不同领域的应用示例,包括食品配方提取、多物种细胞信号通路、疾病治疗方法、多步药物机制以及化学物质到疾病的因果图。当前SPIRES的准确率与现有关系抽取(RE)方法的中等水平相当,但其优势在于易于定制、灵活性高,且关键的是能够在缺乏任何训练数据的情况下执行新任务。该方法支持一种通用策略,即利用LLM的语言解释能力来组装知识库,辅助人工知识整理与获取,同时支持使用LLM外部的公开数据库和本体进行验证。SPIRES作为开源OntoGPT包的一部分提供:https://github.com/monarch-initiative/ontogpt。