This paper presents ReverseNER, a framework aimed at overcoming the limitations of large language models (LLMs) in zero-shot Named Entity Recognition (NER) tasks, particularly in cases where certain entity types have ambiguous boundaries. ReverseNER tackles this challenge by constructing a reliable example library with the reversed process of NER. Rather than beginning with sentences, this method uses an LLM to generate entities based on their definitions and then expands them into full sentences. During sentence generation, the LLM is guided to replicate the structure of a specific 'feature sentence', extracted from the task sentences by clustering. This results in well-annotated sentences with clearly labeled entities, while preserving semantic and structural similarity to the task sentences. Once the example library is constructed, the method selects the most semantically similar example labels for each task sentence to support the LLM's inference. We also propose an entity-level self-consistency scoring mechanism to improve NER performance with LLMs. Experiments show that ReverseNER significantly outperforms traditional zero-shot NER with LLMs and surpasses several few-shot methods, marking a notable improvement in NER for domains with limited labeled data.
翻译:本文提出ReverseNER框架,旨在克服大语言模型在零样本命名实体识别任务中的局限性,特别是在某些实体类型边界模糊的情况下。ReverseNER通过构建反向NER过程驱动的可靠示例库来解决这一挑战。该方法并非从句子出发,而是利用大语言模型根据实体定义生成实体,进而扩展为完整句子。在句子生成过程中,通过聚类从任务句子中提取特定"特征句",并引导大语言模型复现其特征结构。由此产生的句子具有清晰标注的实体,同时保持与任务句子在语义和结构上的相似性。示例库构建完成后,本方法为每个任务句子选择语义最相似的示例标签,以支持大语言模型的推理。我们还提出了一种实体级自一致性评分机制来提升大语言模型的NER性能。实验表明,ReverseNER显著优于传统的大语言模型零样本NER方法,并超越多个少样本学习方法,标志着在标注数据有限领域的NER任务中取得了显著进展。