Document-Level Zero-Shot Relation Extraction (DocZSRE) aims to predict unseen relation labels in text documents without prior training on specific relations. Existing approaches rely on Large Language Models (LLMs) to generate synthetic data for unseen labels, which poses challenges for low-resource languages like Malaysian English. These challenges include the incorporation of local linguistic nuances and the risk of factual inaccuracies in LLM-generated data. This paper introduces Document-Level Zero-Shot Relation Extraction with Entity Side Information (DocZSRE-SI) to address limitations in the existing DocZSRE approach. The DocZSRE-SI framework leverages Entity Side Information, such as Entity Mention Descriptions and Entity Mention Hypernyms, to perform ZSRE without depending on LLM-generated synthetic data. The proposed low-complexity model achieves an average improvement of 11.6% in the macro F1-Score compared to baseline models and existing benchmarks. By utilizing Entity Side Information, DocZSRE-SI offers a robust and efficient alternative to error-prone, LLM-based methods, demonstrating significant advancements in handling low-resource languages and linguistic diversity in relation extraction tasks. This research provides a scalable and reliable solution for ZSRE, particularly in contexts like Malaysian English news articles, where traditional LLM-based approaches fall short.
翻译:文档级零样本关系抽取(DocZSRE)旨在无需对特定关系进行先验训练的情况下,预测文本文档中未见的关系标签。现有方法依赖大型语言模型(LLMs)为未见标签生成合成数据,这对马来西亚英语等低资源语言提出了挑战。这些挑战包括融入本地语言细微差别以及LLM生成数据存在事实不准确的风险。本文提出基于实体侧信息的文档级零样本关系抽取(DocZSRE-SI),以解决现有DocZSRE方法的局限性。DocZSRE-SI框架利用实体侧信息(如实体提及描述和实体提及上位词)执行零样本关系抽取,无需依赖LLM生成的合成数据。所提出的低复杂度模型相较于基线模型和现有基准,在宏观F1分数上平均提升了11.6%。通过利用实体侧信息,DocZSRE-SI为易出错的基于LLM的方法提供了一种鲁棒且高效的替代方案,在处理低资源语言和关系抽取任务中的语言多样性方面展现出显著进步。本研究为零样本关系抽取提供了一个可扩展且可靠的解决方案,尤其适用于马来西亚英语新闻文章等传统基于LLM的方法效果不佳的场景。