Identifying which text spans refer to entities -- mention detection -- is both foundational for information extraction and a known performance bottleneck. We introduce ToMMeR, a lightweight model (<300K parameters) probing mention detection capabilities from early LLM layers. Across 13 NER benchmarks, ToMMeR achieves 93\% recall zero-shot, with over 90\% precision using an LLM as a judge showing that ToMMeR rarely produces spurious predictions despite high recall. Cross-model analysis reveals that diverse architectures (14M-15B parameters) converge on similar mention boundaries (DICE >75\%), confirming that mention detection emerges naturally from language modeling. When extended with span classification heads, ToMMeR achieves near SOTA NER performance (80-87\% F1 on standard benchmarks). Our work provides evidence that structured entity representations exist in early transformer layers and can be efficiently recovered with minimal parameters.
翻译:识别文本中哪些片段指向实体——即提及检测——既是信息提取的基础任务,也是一个已知的性能瓶颈。我们提出了ToMMeR,一种轻量级模型(参数<30万),用于探测早期LLM层中的提及检测能力。在13个NER基准测试中,ToMMeR实现了93%的零样本召回率,并利用LLM作为评判器获得了超过90%的精确率,这表明尽管召回率很高,ToMMeR极少产生虚假预测。跨模型分析表明,不同架构的模型(参数范围1400万至150亿)在提及边界上趋于一致(DICE系数>75%),证实了提及检测能力自然涌现于语言建模过程。当扩展以包含跨度分类头时,ToMMeR在标准基准测试上达到了接近最优的NER性能(F1分数80-87%)。我们的工作证明,结构化实体表示存在于Transformer早期层中,并可通过极少的参数高效恢复。