Document-Level Zero-Shot Relation Extraction with Entity Side Information

Document-Level Zero-Shot Relation Extraction (DocZSRE) aims to predict unseen relation labels in text documents without prior training on specific relations. Existing approaches rely on Large Language Models (LLMs) to generate synthetic data for unseen labels, which poses challenges for low-resource languages like Malaysian English. These challenges include the incorporation of local linguistic nuances and the risk of factual inaccuracies in LLM-generated data. This paper introduces Document-Level Zero-Shot Relation Extraction with Entity Side Information (DocZSRE-SI) to address limitations in the existing DocZSRE approach. The DocZSRE-SI framework leverages Entity Side Information, such as Entity Mention Descriptions and Entity Mention Hypernyms, to perform ZSRE without depending on LLM-generated synthetic data. The proposed low-complexity model achieves an average improvement of 11.6% in the macro F1-Score compared to baseline models and existing benchmarks. By utilizing Entity Side Information, DocZSRE-SI offers a robust and efficient alternative to error-prone, LLM-based methods, demonstrating significant advancements in handling low-resource languages and linguistic diversity in relation extraction tasks. This research provides a scalable and reliable solution for ZSRE, particularly in contexts like Malaysian English news articles, where traditional LLM-based approaches fall short.

翻译：文档级零样本关系抽取（DocZSRE）旨在无需对特定关系进行先验训练的情况下，预测文本文档中未见的关系标签。现有方法依赖大型语言模型（LLMs）为未见标签生成合成数据，这对马来西亚英语等低资源语言提出了挑战。这些挑战包括融入本地语言细微差别以及LLM生成数据存在事实不准确的风险。本文提出基于实体侧信息的文档级零样本关系抽取（DocZSRE-SI），以解决现有DocZSRE方法的局限性。DocZSRE-SI框架利用实体侧信息（如实体提及描述和实体提及上位词）执行零样本关系抽取，无需依赖LLM生成的合成数据。所提出的低复杂度模型相较于基线模型和现有基准，在宏观F1分数上平均提升了11.6%。通过利用实体侧信息，DocZSRE-SI为易出错的基于LLM的方法提供了一种鲁棒且高效的替代方案，在处理低资源语言和关系抽取任务中的语言多样性方面展现出显著进步。本研究为零样本关系抽取提供了一个可扩展且可靠的解决方案，尤其适用于马来西亚英语新闻文章等传统基于LLM的方法效果不佳的场景。

相关内容

实体

关注 12

实体（entity）是有可区别性且独立存在的某种事物，但它不需要是物质上的存在。尤其是抽象和法律拟制也通常被视为实体。实体可被看成是一包含有子集的集合。在哲学里，这种集合被称为客体。实体可被使用来指涉某个可能是人、动物、植物或真菌等不会思考的生命、无生命物体或信念等的事物。在这一方面，实体可以被视为一全包的词语。有时，实体被当做本质的广义，不论即指的是否为物质上的存在，如时常会指涉到的无物质形式的实体－语言。更有甚者，实体有时亦指存在或本质本身。在法律上，实体是指能具有权利和义务的事物。这通常是指法人，但也包括自然人。

组合式零样本学习综述

专知会员服务

17+阅读 · 2025年11月7日

【AAAI2025】SAIL：面向样本的上下文学习用于文档信息提取

专知会员服务

21+阅读 · 2024年12月24日

【ACMMM2024】视觉-语义分解和部分对齐在基于文档的零样本学习中的应用

专知会员服务

19+阅读 · 2024年7月24日

【WWW2024】LLM 中的一致性引导知识检索和去噪用于零样本文档级关系三元组抽取

专知会员服务

24+阅读 · 2024年1月27日