Recent advancements in the field of Natural Language Processing, particularly the development of large-scale language models that are pretrained on vast amounts of knowledge, are creating novel opportunities within the realm of Knowledge Engineering. In this paper, we investigate the usage of large language models (LLMs) in both zero-shot and in-context learning settings to tackle the problem of extracting procedures from unstructured PDF text in an incremental question-answering fashion. In particular, we leverage the current state-of-the-art GPT-4 (Generative Pre-trained Transformer 4) model, accompanied by two variations of in-context learning that involve an ontology with definitions of procedures and steps and a limited number of samples of few-shot learning. The findings highlight both the promise of this approach and the value of the in-context learning customisations. These modifications have the potential to significantly address the challenge of obtaining sufficient training data, a hurdle often encountered in deep learning-based Natural Language Processing techniques for procedure extraction.
翻译:自然语言处理领域的最新进展,特别是基于海量知识预训练的大规模语言模型的发展,正在为知识工程领域创造新的机遇。本文研究在零样本和情境学习设置下使用大型语言模型,以增量问答方式从非结构化PDF文本中提取程序的问题。具体而言,我们采用当前最先进的GPT-4(生成式预训练Transformer 4)模型,并结合两种情境学习变体:一种利用包含程序和步骤定义的本体知识,另一种采用有限样本的少样本学习。研究结果既展示了该方法的潜力,也凸显了情境学习定制化的价值。这些改进措施有望有效解决基于深度学习的自然语言处理技术在程序提取中常遇到的训练数据不足这一难题。