Document-level information extraction (IE) is a crucial task in natural language processing (NLP). This paper conducts a systematic review of recent document-level IE literature. In addition, we conduct a thorough error analysis with current state-of-the-art algorithms and identify their limitations as well as the remaining challenges for the task of document-level IE. According to our findings, labeling noises, entity coreference resolution, and lack of reasoning, severely affect the performance of document-level IE. The objective of this survey paper is to provide more insights and help NLP researchers to further enhance document-level IE performance.
翻译:文档级信息抽取(IE)是自然语言处理(NLP)中的一项关键任务。本文对近期文档级信息抽取文献进行了系统性综述。此外,我们针对当前最先进的算法进行了深入的错误分析,识别了其局限性以及文档级信息抽取任务中尚存的挑战。研究发现,标注噪声、实体共指消解以及推理能力的缺失严重影响了文档级信息抽取的性能。本综述论文旨在提供更多见解,以帮助NLP研究者进一步提升文档级信息抽取的性能。