Nested named entity recognition identifies entities contained within other entities, but requires expensive multi-level annotation. While flat NER corpora exist abundantly, nested resources remain scarce. We investigate whether models can learn nested structure from flat annotations alone, evaluating four approaches: string inclusions (substring matching), entity corruption (pseudo-nested data), flat neutralization (reducing false negative signal), and a hybrid fine-tuned + LLM pipeline. On NEREL, a Russian benchmark with 29 entity types where 21% of entities are nested, our best combined method achieves 26.37% inner F1, closing 40% of the gap to full nested supervision. Code is available at https://github.com/fulstock/Learning-from-Flat-Annotations.
翻译:嵌套命名实体识别旨在识别包含在其他实体内部的实体,但需要昂贵的多层级标注。虽然平面NER语料库大量存在,但嵌套资源仍然稀缺。我们研究了模型是否能够仅从平面标注中学习嵌套结构,评估了四种方法:字符串包含(子串匹配)、实体破坏(伪嵌套数据)、平面中性化(减少假阴性信号)以及混合微调+LLM流水线。在NEREL(一个包含29种实体类型、其中21%的实体为嵌套的俄语基准数据集)上,我们最佳的组合方法实现了26.37%的内部F1分数,将完全嵌套监督的差距缩小了40%。代码可在 https://github.com/fulstock/Learning-from-Flat-Annotations 获取。