Information extraction (IE) aims to extract structural knowledge from plain natural language texts. Recently, generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. As a result, numerous works have been proposed to integrate LLMs for IE tasks based on a generative paradigm. To conduct a comprehensive systematic review and exploration of LLM efforts for IE tasks, in this study, we survey the most recent advancements in this field. We first present an extensive overview by categorizing these works in terms of various IE subtasks and techniques, and then we empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs. Based on a thorough review conducted, we identify several insights in technique and promising research directions that deserve further exploration in future studies. We maintain a public repository and consistently update related works and resources on GitHub (\href{https://github.com/quqxui/Awesome-LLM4IE-Papers}{LLM4IE repository})
翻译:信息抽取(IE)旨在从纯文本中提取结构化知识。近年来,生成式大语言模型(LLMs)在文本理解与生成方面展现出卓越能力。因此,大量研究基于生成范式提出了整合LLMs用于IE任务的方法。为全面系统地回顾和探索LLMs在IE任务中的应用,本研究对该领域的最新进展进行了综述。我们首先通过按不同IE子任务和技术对这些工作进行分类,呈现了详尽的概览;随后通过实证分析最先进的方法,揭示了LLMs应用于IE任务的新兴趋势。基于深入梳理,我们在技术层面总结出若干重要见解,并指出了未来研究中值得进一步探索的前沿方向。我们在GitHub上维护了一个公开资源库,并持续更新相关研究成果与资源(\href{https://github.com/quqxui/Awesome-LLM4IE-Papers}{LLM4IE资源库})。